问题
After reading man bpf
and a few other sources of documentation, I was under impression that a map
can be only created by user process. However the following small program seems to magically create bpf
map:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 10,
};
SEC("sockops")
int my_prog(struct bpf_sock_ops *skops)
{
u32 key = 1;
long *value;
...
value = bpf_map_lookup_elem(&my_map, &key);
...
return 1;
}
So I load the program with the kernel's tools/bpf/bpftool
and also verify that program is loaded:
$ bpftool prog show
1: sock_ops name my_prog tag f3a3583cdd82ae8d
loaded_at Jan 02/18:46 uid 0
xlated 728B not jited memlock 4096B
$ bpftool map show
1: array name my_map flags 0x0
key 4B value 8B max_entries 10 memlock 4096B
Of course the map is empty. However, removing bpf_map_lookup_elem
from the program results in no map being created.
UPDATE
I debugged it with strace
and found that in both cases, i.e. with bpf_map_lookup_elem
and without it, bpftool does invoke bpf(BPF_MAP_CREATE, ...)
and it apparently succeeds. Then, in case of bpf_map_lookup_elem left out, I strace on bpftool map show
, and bpf(BPF_MAP_GET_NEXT_ID, ..)
immediately returns ENOENT
, and it never gets to dump a map. So obviously something is not completing the map creation.
So I wonder if this is expected behavior?
Thanks.
回答1:
As explained by antiduh, and confirmed with your strace
checks, bpftool
is the user space program creating the maps in this case. It calls function bpf_prog_load()
from libbpf (under tools/lib/bpf/
), which in turn ends up performing the syscall. Then the program is pinned at the desired location (under a bpf
virtual file system mount point), so that it is not unloaded when bpftool returns. Maps are not pinned.
Regarding map creation, the magic bits also take place in libbpf. When bpf_prog_load()
is called, libbpf receives the name of the object file as an argument. bpftool
does not ask to load this specific program or that specific map; instead, it provides the object file and libbpf has to deal with it. So the functions in libbpf parse this ELF object file, and eventually find a number of sections corresponding to maps and programs. Then it tries to load the first program.
Loading this program includes the following steps:
CHECK_ERR(bpf_object__create_maps(obj), err, out);
CHECK_ERR(bpf_object__relocate(obj), err, out);
CHECK_ERR(bpf_object__load_progs(obj), err, out);
In other words: start by creating all maps we found in the object file. Then perform map relocation (i.e. associate map index to eBPF instructions), and at last load program instructions.
So regarding your question: in both cases, with and without bpf_map_lookup_elem()
, maps are created with a bpf(BPF_MAP_CREATE, ...)
syscall. After that, relocation happens, and program instructions are adapted to point, if needed, to the newly created maps. Then once all steps are finished and the program is loaded, bpftool
exits. The eBPF program should be pinned, and still loaded in the kernel. As far as I understand, if it does use the maps (if bpf_map_lookup_elem()
was used), then maps are still referenced by a loaded program, and are kept in the kernel. On the other hand, if the program does not use the maps, then there is nothing more to hold them back, so the maps are destroyed when the file descriptors held by bpftool
are closed, when bpftool
returns.
So in the end, when bpftool
has completed, you have a map loaded in the kernel if the program uses it, but no map if no program would rely on it. Sounds like expected behaviour in my opinion; but please do ping one way or another if you experience strange things with bpftool
, I'm one of the guys working on the utility. One last generic observation: maps can also be pinned and remain in the kernel even if no program uses them, should one need to keep them around.
回答2:
I was under impression that a map can be only created by user process.
You're completely right - user programs are the ones that invoke the bpf
system call in order to load eBPF programs and create eBPF maps.
And you did just that:
So I load the program with tools/bpf/bpftool and ...
Your bpftool program is the user process that is invoking the bpf
syscall, and thus is the user process that is creating the eBPF map.
BPF programs don't have to be unloaded when the user program that created it quits - bpftool likely uses this mechanism.
Some relevant bits from the man page to connect the dots:
A user process can create multiple maps ... and access them via file descriptors.
Generally, eBPF programs are loaded by the user process and automatically unloaded when the process exits. In some cases ... the program will continue to stay alive inside the kernel even after the process that loaded the program exits.
Each eBPF program is a set of instructions that is safe to run until its completion. ... During verification, the kernel increments reference counts for each of the maps that the eBPF program uses, so that the attached maps can't be removed until the program is unloaded.
来源:https://stackoverflow.com/questions/48067163/who-creates-map-in-bpf