The Curious Incident of the -EINVAL in the libbpf

"Is there any point to which you would wish to draw my attention?"
"To the curious incident of the libbpf in the night-time."
"The libbpf returned -EINVAL in the night-time."
"That was the curious incident," remarked Sherlock Holmes.

Recently at work, we found ourselves trying to diagnose a persistent bug that was only occurring on CentOS 7. Eventually figuring things out after much headbashing, I realized the debugging techniques used to root it out, as well as the bug itself, would be useful information to spread around.

The Problem

Specifically, when trying to run the Cmd agent, we were receiving the following error message:

libbpf: Error in bpf_object__probe_loading():Invalid argument(22). Couldn't
load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.

Quick bit of background here: the Cmd agent loads a number of BPF programs on startup. These programs are run in kernelspace when certain events of interest take place (e.g. process creation, terminal write, file open, etc.) and allow us to safely and efficiently monitor security related events on a Linux system.

On Linux, all BPF related operations are performed through the bpf(2) syscall. That syscall, however, is a complex beast (man 2 bpf if you need proof). Because of this, many BPF-enabled applications use libbpf instead, which abstracts away the granularity of the bpf(2) syscall and provides a much higher level and more user friendly interface to Linux's BPF functionality.

And it is here in libbpf that the error in question is logged. Quick checks based off the suggestions in the error confirmed that the kernel being used supported BPF and that RLIMIT_MEMLOCK was set to unlimited. A deeper investigation was required.

Digging Into the Code

Taking a look at the libbpf source, the code producing the error message was easy to locate:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
static int
bpf_object__probe_loading(struct bpf_object *obj)
{
        struct bpf_load_program_attr attr;
        char *cp, errmsg[STRERR_BUFSIZE];
        struct bpf_insn insns[] = {
                BPF_MOV64_IMM(BPF_REG_0, 0),
                BPF_EXIT_INSN(),
        };
        int ret;

        /* make sure basic loading works */

        memset(&attr, 0, sizeof(attr));
        attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
        attr.insns = insns;
        attr.insns_cnt = ARRAY_SIZE(insns);
        attr.license = "GPL";

        ret = bpf_load_program_xattr(&attr, NULL, 0);
        if (ret < 0) {
                ret = errno;
                cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
                pr_warn("Error in %s():%s(%d). Couldn't load trivial BPF "
                        "program. Make sure your kernel supports BPF "
                        "(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is "
                        "set to big enough value.\n", __func__, cp, ret);
                return -ret;
        }
        close(ret);

        return 0;
}

Understanding this code line-by-line isn't completely important. You just need to know that this code implements a basic sanity check that runs before libbpf loads our actual BPF program. It attempts to load the following simple BPF program (defined on lines 7-8 and loaded on line 20) to verify basic BPF functionality is working:

r0 = 0
exit

As you can probably gauge, for whatever reason, the call to bpf_load_program_xattr on line 20 was returning -EINVAL (invalid argument), causing the previously mentioned error to be logged and program loading to fail. As mentioned, this was only occurring on CentOS 7.

A quick strace confirmed that the -EINVAL we were seeing in the relatively high-level code in libbpf was passed up after being returned from the bpf(2) syscall invocation used to load the test program:

bpf(BPF_PROG_LOAD,
    {
        prog_type=BPF_PROG_TYPE_SOCKET_FILTER,
        insn_cnt=2,
        insns=0x7ffe41739de0,
        license="GPL",
        log_level=0,
        log_size=0,
        log_buf=NULL,
        kern_version=KERNEL_VERSION(0, 0, 0),
        prog_flags=0,
        prog_name="",
        prog_ifindex=0,
        expected_attach_type=BPF_CGROUP_INET_INGRESS
    },
    120) = -1 EINVAL (Invalid argument)

man 2 bpf lists a few cases in which bpf(2) can return -EINVAL for a BPF_PROG_LOAD operation, all of which were confirmed to not be applicable in this case. A dive into the kernel sources would be required.

Diving in With Kprobes

CentOS 7 uses a 3.10 kernel with BPF support backported from 4.20. With this in mind, attention turned to the kernel function in 4.20 used to load a BPF program:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
static int bpf_prog_load(union bpf_attr *attr)
{
        enum bpf_prog_type type = attr->prog_type;
        struct bpf_prog *prog;
        int err;
        char license[128];
        bool is_gpl;

        if (CHECK_ATTR(BPF_PROG_LOAD))
                return -EINVAL;

        if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
                return -EINVAL;

        /* copy eBPF program license from user space */
        if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
                              sizeof(license) - 1) < 0)
                return -EFAULT;
        license[sizeof(license) - 1] = 0;

        /* ... truncated for brevity */

The first order of business was ensuring execution in kernelspace was even getting to this function from the syscall entry point. Kprobes have a really nifty sysfs interface allowing you to see if almost any kernel function is being called with just a few commands. Using this functionality, we tried to confirm that execution in kernelspace was at least reaching bpf_prog_load by creating a new kprobe (see link above for detailed syntax documentation):

# echo 'p:mybpf_prog_load bpf_prog_load attr=%ax' > \
      /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/mybpf_prog_load/enable

With the kprobe created and enabled, we can now tail /sys/kernel/debug/tracing/trace_pipe and reproduce the bug to see if bpf_prog_load is being hit:

# cat /sys/kernel/debug/tracing/trace_pipe
cmd-10975 [000] d...    540.050401: mybpf_prog_load: (bpf_prog_load+0x0/0x570) attr=0xffffffffb439cbdc
cmd-10975 [000] d...    540.050435: mybpf_prog_load: (bpf_prog_load+0x0/0x570) attr=0xffffffffb439cbdc

So we're getting to bpf_prog_load at the very least (one line is logged each time the kprobe is hit). It's now a question of how far we progress within bpf_prog_load. Let's put a kprobe on that strncpy_from_user call on line 16 and see if it's being hit:

# echo 'r:mystrncpy_from_user strncpy_from_user $retval' > \
      /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/mystrncpy_from_user/enable

Since strncpy_from_user is used all over the place in the kernel, this time we're using a kretprobe instead (note the r: prefix) so we can use grep to filter for that specific invocation of strncpy_from_user like so:

# cat /sys/kernel/debug/tracing/trace_pipe | grep bpf_prog_load
[... no output]

Interesting, it appears execution isn't getting that far. Let's double check by doing the same thing and running the execsnoop utility from BCC (which was working as expected, thus it should execute bpf_prog_load in full):

# cat /sys/kernel/debug/tracing/trace_pipe | grep bpf_prog_load
execsnoop-11029 [000] d...    1414.411589: mystrncpy_from_user: (bpf_prog_load+0xc3/0x570 <- strncpy_from_user) arg1=0x3
execsnoop-11029 [000] d...    1414.415006: mystrncpy_from_user: (bpf_prog_load+0xc3/0x570 <- strncpy_from_user) arg1=0x3

Looks like we didn't make a mistake the first time; execution isn't getting to line 16 when the test program is loaded by libbpf.

Note that (as mentioned above) since the name of the function being returned to is logged, piping the output through grep allows us to filter for the one specific strncpy_from_user invocation in bpf_prog_load.

Based on what we've figured out so far, it would appear the mystery -EINVAL is coming from either line 10 or 13. This kicked off a rather tedious manual inspection of the data passed to the bpf(2) syscall, which concluded that it was impossible for the expressions in either of those two conditionals to evaluate to true.

So what gives?

The CentOS 7 Kernel

This whole time, we hadn't considered the fact that we were referencing mainline kernel source while CentOS 7 does not use a mainline kernel. Due to stability concerns, CentOS 7 uses an older 3.10 kernel and backports a ton of features from newer kernels to plug the functionality gap created. The fact that there was no logical place for this -EINVAL to be coming from in a mainline 4.20 kernel strongly suggested that the folks at Red Hat had added their own custom logic to the function.

Getting the source for a CentOS kernel is a bit of an ordeal but possible. Extracting the sources, we learned that this is what bpf_prog_load looks like for CentOS 7 (note the new conditional on line 10):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
static int bpf_prog_load(union bpf_attr *attr)
{
        enum bpf_prog_type type = attr->prog_type;
        struct bpf_prog *prog;
        int err;
        char license[128];
        bool is_gpl;

        /* RHEL7 - allow only following types */
        if (type != BPF_PROG_TYPE_KPROBE &&
            type != BPF_PROG_TYPE_TRACEPOINT &&
            type != BPF_PROG_TYPE_PERF_EVENT &&
            !is_test_type(type))
                return -EINVAL;

        if (CHECK_ATTR(BPF_PROG_LOAD))
                return -EINVAL;

        if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
                return -EINVAL;

        /* copy eBPF program license from user space */
        if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
                              sizeof(license) - 1) < 0)
                return -EFAULT;
        license[sizeof(license) - 1] = 0;

Consider line 15 from the libbpf listing that was logging the original error of concern:

attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;

So libbpf is trying to load the test program as a socket filter program, but the RHEL 7 / CentOS 7 kernel forbids all but tracing programs. Googling confirms that this is the case:

eBPF in Red Hat Enterprise Linux 7.6 is enabled only for tracing purposes, which allows attaching eBPF programs to probes, tracepoints and perf events. Other use cases such as eBPF socket filters or eXpress DataPath (XDP) are not enabled at this stage.

Using BPF_PROG_TYPE_SOCKET_FILTER in libbpf to load the test program is actually a reasonable choice by the libbpf developers. Alongside BPF_PROG_TYPE_CGROUP_SKB, that program type is, at the time of writing, the only program type that can be run without CAP_SYS_ADMIN. This means that if you happen to want to use libbpf as a underprivileged user, the test program must be of type BPF_PROG_TYPE_SOCKET_FILTER/BPF_PROG_TYPE_CGROUP_SKB for the check to work.

Conclusion

This whole ordeal really demonstrated the value of kprobes for peeking inside a running kernel. Use-cases such as this for kprobes are somewhat rare; usually, the man pages will give you a good description of what a syscall should return given a set of arguments. In exceptional cases like this, however, where syscall or other kernel behavior deviates from the spec, being able to quickly and easily tell what's being called in the kernel with a few commands is a very powerful ability to have.

We worked around this issue by applying a custom patch to libbpf as part of our build process that changed the type of the test program to BPF_PROG_TYPE_TRACEPOINT. Since our agent runs with the necessary capabilities at program load time anyways, it can load the test program regardless of its type, allowing us to fairly easily work around this issue.

© Rhys Rustad-Elliott. Built using Pelican. Best viewed with NCSA Mosaic on a 250MHz Pentium 2 machine running Windows 3.1 or later.