"Is there any point to which you would wish to draw my attention?"
"To the curious incident of the libbpf in the night-time."
"The libbpf returned-EINVAL
in the night-time."
"That was the curious incident," remarked Sherlock Holmes.
Recently at work, we found ourselves trying to diagnose a persistent bug that was only occurring on CentOS 7. Eventually figuring things out after much headbashing, I realized the debugging techniques used to root it out, as well as the bug itself, would be useful information to spread around.
The Problem
Specifically, when trying to run the Cmd agent, we were receiving the following error message:
libbpf: Error in bpf_object__probe_loading():Invalid argument(22). Couldn't
load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.
Quick bit of background here: the Cmd agent loads a number of BPF programs on startup. These programs are run in kernelspace when certain events of interest take place (e.g. process creation, terminal write, file open, etc.) and allow us to safely and efficiently monitor security related events on a Linux system.
On Linux, all BPF related operations are performed through the bpf(2)
syscall. That syscall, however, is a complex beast (man 2
bpf
if you need proof).
Because of this, many BPF-enabled applications use
libbpf
instead, which abstracts away the granularity of the bpf(2)
syscall and
provides a much higher level and more user friendly interface to Linux's BPF
functionality.
And it is here in libbpf that the error in question is logged. Quick checks
based off the suggestions in the error confirmed that the kernel being used
supported BPF and that RLIMIT_MEMLOCK
was set to unlimited. A deeper
investigation was required.
Digging Into the Code
Taking a look at the libbpf source, the code producing the error message was easy to locate:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Understanding this code line-by-line isn't completely important. You just need to know that this code implements a basic sanity check that runs before libbpf loads our actual BPF program. It attempts to load the following simple BPF program (defined on lines 7-8 and loaded on line 20) to verify basic BPF functionality is working:
r0 = 0
exit
As you can probably gauge, for whatever reason, the call to
bpf_load_program_xattr
on line 20 was returning -EINVAL
(invalid argument),
causing the previously mentioned error to be logged and program loading to
fail. As mentioned, this was only occurring on CentOS 7.
A quick strace
confirmed that the -EINVAL
we were seeing in the relatively
high-level code in libbpf was passed up after being returned from the bpf(2)
syscall invocation used to load the test program:
bpf(BPF_PROG_LOAD,
{
prog_type=BPF_PROG_TYPE_SOCKET_FILTER,
insn_cnt=2,
insns=0x7ffe41739de0,
license="GPL",
log_level=0,
log_size=0,
log_buf=NULL,
kern_version=KERNEL_VERSION(0, 0, 0),
prog_flags=0,
prog_name="",
prog_ifindex=0,
expected_attach_type=BPF_CGROUP_INET_INGRESS
},
120) = -1 EINVAL (Invalid argument)
man 2 bpf
lists a few cases in which bpf(2)
can return -EINVAL
for a
BPF_PROG_LOAD
operation, all of which were confirmed to not be applicable in
this case. A dive into the kernel sources would be required.
Diving in With Kprobes
CentOS 7 uses a 3.10 kernel with BPF support backported from 4.20. With this in mind, attention turned to the kernel function in 4.20 used to load a BPF program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
The first order of business was ensuring execution in kernelspace was even
getting to this function from the syscall entry point.
Kprobes have a
really nifty sysfs
interface
allowing you to see if almost any kernel function is being called with
just a few commands. Using this functionality, we tried to confirm that
execution in kernelspace was at least reaching bpf_prog_load
by creating a
new kprobe (see link above for detailed syntax documentation):
# echo 'p:mybpf_prog_load bpf_prog_load attr=%ax' > \
/sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/mybpf_prog_load/enable
With the kprobe created and enabled, we can now tail
/sys/kernel/debug/tracing/trace_pipe
and reproduce the bug to see if
bpf_prog_load
is being hit:
# cat /sys/kernel/debug/tracing/trace_pipe
cmd-10975 [000] d... 540.050401: mybpf_prog_load: (bpf_prog_load+0x0/0x570) attr=0xffffffffb439cbdc
cmd-10975 [000] d... 540.050435: mybpf_prog_load: (bpf_prog_load+0x0/0x570) attr=0xffffffffb439cbdc
So we're getting to bpf_prog_load
at the very least (one line is logged each
time the kprobe is hit). It's now a question of how far we progress within
bpf_prog_load
. Let's put a kprobe on that strncpy_from_user
call on line 16
and see if it's being hit:
# echo 'r:mystrncpy_from_user strncpy_from_user $retval' > \
/sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/mystrncpy_from_user/enable
Since strncpy_from_user
is used all over the place in the kernel, this time
we're using a
kretprobe
instead (note the r:
prefix) so we can use grep
to filter for that specific
invocation of strncpy_from_user
like so:
# cat /sys/kernel/debug/tracing/trace_pipe | grep bpf_prog_load
[... no output]
Interesting, it appears execution isn't getting that far. Let's double check by
doing the same thing and running the execsnoop
utility from
BCC (which was working as expected, thus it
should execute bpf_prog_load
in full):
# cat /sys/kernel/debug/tracing/trace_pipe | grep bpf_prog_load
execsnoop-11029 [000] d... 1414.411589: mystrncpy_from_user: (bpf_prog_load+0xc3/0x570 <- strncpy_from_user) arg1=0x3
execsnoop-11029 [000] d... 1414.415006: mystrncpy_from_user: (bpf_prog_load+0xc3/0x570 <- strncpy_from_user) arg1=0x3
Looks like we didn't make a mistake the first time; execution isn't getting to line 16 when the test program is loaded by libbpf.
Note that (as mentioned above) since the name of the function being returned to
is logged, piping the output through grep
allows us to filter for the one
specific strncpy_from_user
invocation in bpf_prog_load
.
Based on what we've figured out so far, it would appear the mystery -EINVAL
is coming from either line 10 or 13. This kicked off a rather tedious manual
inspection of the data passed to the bpf(2)
syscall, which concluded that it
was impossible for the expressions in either of those two conditionals to
evaluate to true.
So what gives?
The CentOS 7 Kernel
This whole time, we hadn't considered the fact that we were referencing
mainline kernel source while CentOS 7 does not use a mainline kernel. Due to
stability concerns, CentOS 7 uses an older 3.10 kernel and backports a ton of
features from newer kernels to plug the functionality gap created. The fact
that there was no logical place for this -EINVAL
to be coming from in a
mainline 4.20 kernel strongly suggested that the folks at Red Hat had added
their own custom logic to the function.
Getting the source for a CentOS kernel is a bit of an
ordeal but possible.
Extracting the sources, we learned that this is what bpf_prog_load
looks like
for CentOS 7 (note the new conditional on line 10):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Consider line 15 from the libbpf listing that was logging the original error of concern:
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
So libbpf is trying to load the test program as a socket filter program, but the RHEL 7 / CentOS 7 kernel forbids all but tracing programs. Googling confirms that this is the case:
eBPF in Red Hat Enterprise Linux 7.6 is enabled only for tracing purposes, which allows attaching eBPF programs to probes, tracepoints and perf events. Other use cases such as eBPF socket filters or eXpress DataPath (XDP) are not enabled at this stage.
Using BPF_PROG_TYPE_SOCKET_FILTER
in libbpf to load the test program is
actually a reasonable choice by the libbpf developers. Alongside
BPF_PROG_TYPE_CGROUP_SKB
, that program type is, at the time of writing, the
only program type that can be run without CAP_SYS_ADMIN
. This means that if
you happen to want to use libbpf as a underprivileged user, the test program
must be of type BPF_PROG_TYPE_SOCKET_FILTER
/BPF_PROG_TYPE_CGROUP_SKB
for
the check to work.
Conclusion
This whole ordeal really demonstrated the value of kprobes for peeking inside a running kernel. Use-cases such as this for kprobes are somewhat rare; usually, the man pages will give you a good description of what a syscall should return given a set of arguments. In exceptional cases like this, however, where syscall or other kernel behavior deviates from the spec, being able to quickly and easily tell what's being called in the kernel with a few commands is a very powerful ability to have.
We worked around this issue by applying a custom patch to libbpf as part of our
build process that changed the type of the test program to
BPF_PROG_TYPE_TRACEPOINT
. Since our agent runs with the necessary
capabilities at program load time anyways, it can load the test program
regardless of its type, allowing us to fairly easily work around this issue.