- Source: EBPF
eBPF is a technology that can run programs in a privileged context such as the operating system kernel. It is the successor to the Berkeley Packet Filter (BPF, with the "e" originally meaning "extended") filtering mechanism in Linux and is also used in non-networking parts of the Linux kernel as well.
It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules. Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.
This validation model differs from sandboxed environments, where the execution environment is restricted and the runtime has no insight about the program. Examples of programs that are automatically rejected are programs without strong exit guarantees (i.e. for/while loops without exit conditions) and programs dereferencing pointers without safety checks.
Design
Loaded programs which passed the verifier are either interpreted or in-kernel just-in-time compiled (JIT compiled) for native execution performance. The execution model is event-driven and with few exceptions run-to-completion, meaning, programs can be attached to various hook points in the operating system kernel and are run upon triggering of an event. eBPF use cases include (but are not limited to) networking such as XDP, tracing and security subsystems. Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues, Brendan Gregg famously dubbed eBPF "superpowers for Linux". Linus Torvalds said, "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for". Due to its success in Linux, the eBPF runtime has been ported to other operating systems such as Windows.
History
eBPF evolved from the classic Berkeley Packet Filter (cBPF, a retroactively-applied name). At the most basic level, it introduced the use of ten 64-bit registers (instead of two 32-bit long registers for cBPF), different jump semantics, a call instruction and corresponding register passing convention, new instructions, and a different encoding for these instructions.
Architecture and concepts
= eBPF maps
=eBPF maps are efficient key/value stores that reside in kernel space and can be used to share data among multiple eBPF programs or to communicate between a user space application and eBPF code running in the kernel. eBPF programs can leverage eBPF maps to store and retrieve data in a wide set of data structures. Map implementations are provided by the core kernel. There are various types, including hash maps, arrays, and ring buffers.
In practice, eBPF maps are typically used for scenarios such as a user space program writing configuration information to be retrieved by an eBPF program, an eBPF program storing state for later retrieval by another eBPF program (or a future run of the same program), or an eBPF program writing results or metrics into a map for retrieval by a user space program that will present results.
= eBPF virtual machine
=The eBPF virtual machine runs within the kernel and takes in a program in the form of eBPF bytecode instructions which are converted to native machine instructions that run on the CPU. Early implementations of eBPF saw eBPF bytecode interpreted, but this has now been replaced with a Just-in-Time (JIT) compilation process for performance and security-related reasons.
The eBPF virtual machine consists of eleven 64-bit registers with 32-bit subregisters, a program counter and a 512-byte large BPF stack space. These general purpose registers keep track of state when eBPF programs are executed.
= Tail calls
=Tail calls can call and execute another eBPF program and replace the execution context, similar to how the execve() system call operates for regular processes. This basically allows an eBPF program to call another eBPF program. Tail calls are implemented as a long jump, reusing the same stack frame. Tail calls are particularly useful in eBPF, where the stack is limited to 512 bytes. During runtime, functionality can be added or replaced atomically, thus altering the BPF program’s execution behavior. A popular use case for tail calls is to spread the complexity of eBPF programs over several programs. Another use case is for replacing or extending logic by replacing the contents of the program array while it is in use. For example, to update a program version without downtime or to enable/disable logic.
= BPF to BPF calls
=It is generally considered good practice in software development to group common code into a function encapsulating logic for reusability. Prior to Linux kernel 4.16 and LLVM 6.0, a typical eBPF C program had to explicitly direct the compiler to inline a function resulting in a BPF object file that had duplicate functions. This restriction was lifted, and mainstream eBPF compilers now support writing functions naturally in eBPF programs. This reduces the generated eBPF code size making it friendlier to a CPU instruction cache.
= eBPF verifier
=The verifier is a core component of eBPF, and its main responsibility is to ensure that an eBPF program is safe to execute. It performs a static analysis of the eBPF bytecode to guarantee its safety. The verifier analyzes the program to assess all possible execution paths. It steps through the instructions in order and evaluates them. The verification process starts with a depth-first search through all possible paths of the program, the verifier simulates the execution of each instruction, tracking the state of registers and stack if any instruction could lead to an unsafe state, verification fails. This process continues until all paths have been analyzed or a violation is found. Depending on the type of program, the verifier checks for violations of specific rules. These rules can include checking that an eBPF program always terminates within a reasonable amount of time (no infinite loops or infinite recursion), checking that an eBPF program is not allowed to read arbitrary memory because being able to arbitrary read memory could allow a program leak sensitive information, checking that network programs are not allowed to access memory outside of packet bounds because adjacent memory could contain sensitive information, checking that programs are not allowed to deadlock, so any held spinlocks must be released and only one lock can be held at a time to avoid deadlocks over multiple programs, checking that programs are not allowed to read uninitialized memory. This is not an exhaustive list of the checks the verifier does, and there are exceptions to these rules. An example is that tracing programs have access to helpers that allow them to read memory in a controlled way, but these program types require root privileges and thus do not pose a security risk.
Over time the eBPF verifier has evolved to include newer features and optimizations, such as support for bounded loops, dead-code elimination, function-by-function verification, and callbacks.
= eBPF CO-RE (Compile Once - Run Everywhere)
=eBPF programs use the memory and data structures from the kernel. Some structures can be modified between different kernel versions, altering the memory layout. Since the Linux kernel is continuously developed, there is no guarantee that the internal data structures will remain the same across different versions. CO-RE is a fundamental concept in modern eBPF development that allows eBPF programs to be portable across different kernel versions and configurations. It addresses the challenge of kernel structure variations between different Linux distributions and versions. CO-RE comprises BTF (BPF Type Format) - a metadata format that describes the types used in the kernel and eBPF programs and provides detailed information about struct layouts, field offsets, and data types. It enables runtime accessibility of kernel types, which is crucial for BPF program development and verification. BTF is included in the kernel image of BTF-enable kernels. Special relocations are emitted by the compiler(e.g., LLVM). These relocations capture high-level descriptions of what information the eBPF program intends to access. The libbpf library adapts eBPF programs to work with the data structure layout on the target kernel where they run, even if this layout is different from the kernel where the code was compiled. To do this, libbpf needs the BPF CO-RE relocation information generated by Clang as part of the compilation process. The compiled eBPF program is stored in an ELF (Executable and Linkable Format) object file. This file contains BTF-type information and Clang-generated relocations. The ELF format allows the eBPF loader (e.g., libbpf) to process and adjust the BPF program dynamically for the target kernel.
Branding
The alias eBPF is often interchangeably used with BPF, for example by the Linux kernel community. eBPF and BPF is referred to as a technology name like LLVM. eBPF evolved from the machine language for the filtering virtual machine in the Berkeley Packet Filter as an extended version, but as its use cases outgrew networking, today "eBPF" is preferentially interpreted as a pseudo-acronym.
The bee is the official logo for eBPF. At the first eBPF Summit there was a vote taken and the bee mascot was named "eBee". The logo has originally been created by Vadim Shchekoldin. Earlier unofficial eBPF mascots have existed in the past, but have not seen widespread adoption.
Governance
The eBPF Foundation was created in August 2021 with the goal to expand the contributions being made to extend the powerful capabilities of eBPF and grow beyond Linux. Founding members include Meta, Google, Isovalent, Microsoft and Netflix. The purpose is to raise, budget and spend funds in support of various open source, open data and/or open standards projects relating to eBPF technologies to further drive the growth and adoption of the eBPF ecosystem. Since inception, Red Hat, Huawei, Crowdstrike, Tigera, DaoCloud, Datoms, FutureWei also joined.
Adoption
eBPF has been adopted by a number of large-scale production users, for example:
Meta uses eBPF through their Katran layer 4 load-balancer for all traffic going to facebook.com
Google uses eBPF in GKE, developed and uses BPF LSM to replace audit and it uses eBPF for networking
Cloudflare uses eBPF for load-balancing and DDoS protection and security enforcement
Netflix uses eBPF for fleet-wide network observability and performance diagnosis
Dropbox uses eBPF through Katran for layer 4 load-balancing
Android uses eBPF for NAT46 and traffic monitoring
Samsung Galaxy uses eBPF for Networking solutions
Yahoo! Inc uses eBPF through Cilium for layer 4 load balancing
LinkedIn uses eBPF for infrastructure observability
Alibaba uses eBPF for Kubernetes Pod load-balancing
Datadog uses eBPF for Kubernetes Pod networking and security enforcement
Trip.com uses eBPF for Kubernetes Pod networking
Shopify uses eBPF for intrusion detection through Falco
DoorDash uses eBPF through BPFAgent for kernel level monitoring
Microsoft ported eBPF and XDP to Windows
Seznam uses eBPF through Cilium for layer 4 load-balancing
DigitalOcean uses eBPF and XDP to rate limit access to internal services in their virtual network
CapitalOne uses eBPF for Kubernetes Pod networking
Bell Canada uses eBPF to moderize telco networking with SRv6
Elastic_NV uses eBPF for code profiling as part of their observability offering
Apple uses eBPF for Kubernetes Pod security
Sky uses eBPF for Kubernetes Pod networking
Walmart uses eBPF for layer 4 load-balancing
Huawei uses eBPF through their DIGLIM secure boot system
Ikea uses eBPF for Kubernetes Pod networking
The New York Times uses eBPF for networking
Red Hat uses eBPF at scale for load balancing and tracing in their private cloud
Palantir Technologies uses eBPF to debug networking problems in large scale Kubernetes clusters
Security
Due to the ease of programmability, eBPF has been used as a tool for implementing microarchitectural timing side-channel attacks such as Spectre against vulnerable microprocessors. While unprivileged eBPF implemented mitigations against transient execution attacks, unprivileged use has ultimately been disabled by the kernel community by default to protect from use against future hardware vulnerabilities.
See also
Express Data Path
References
Further reading
Gregg, Brendan (December 2019). BPF Performance Tools. Addison-Wesley. ISBN 978-0136554820.
David Calavera, Lorenzo Fontana (December 2019). Linux Observability With BPF. O'Reilly Media, Incorporated. ISBN 978-1492050209.
Gregg, Brendan (December 2020). Systems Performance, Second edition. ISBN 978-0136820154.
Rice, Liz (April 2022). What Is eBPF?. ISBN 978-1492097259.
Rice, Liz (April 2023). Learning eBPF: Programming the Linux Kernel for Enhanced Observability, Networking, and Security. O'Reilly Media. ISBN 978-1098135126.
Thaler, Dave, ed. (October 2024). BPF Instruction Set Architecture (ISA). IETF. doi:10.17487/RFC9669. RFC 9669. Retrieved 2024-01-05.
External links
eBPF.io - Introduction, tutorials & eBPF community resources
eBPF.foundation - Linux Foundation's eBPF Foundation site
eBPF Developer Tutorial: Learning eBPF Step by Step with Examples
eBPF documentary - Documentary on the beginnings of eBPF
Kata Kunci Pencarian:
- EBPF
- Berkeley Packet Filter
- Express Data Path
- Brendan Gregg
- Cilium (computing)
- Network scheduler
- BPF
- Ghidra
- SystemTap
- Io uring