As C/C++/Rust/OCaml/(insert other language usually compiled to native code here) programmers, when writing code targeting Linux, our toolchains usually produce as final output, a dynamically linked binary in the ELF format. While this covers almost all use cases when writing code meant to run in userspace, there are some circumstances in which it is necessary to write code that runs without the dynamic linker present.
In this article, we'll try to write a standard hello world program in C for an x86-64 Linux system, but with the catch that our compiled ELF binary can't make use of the runtime dynamic linker. As will be seen, this entails a range of interesting challenges not normally encountered when writing userspace code.
Background
There are two related but distinct programs commonly referred to as "the linker" in the context of ELF binaries, one of which operates at compile time and one of which operates at runtime.
The first of these two programs is ld
, provided by GNU
binutils, which is responsible for
resolving cross-file and dynamic library references in object files at compile
time and "linking" them together with these references resolved to produce a
single executable binary or shared object. In most circumstances, ld
is
invoked under the hood when running gcc
as the last step of building a binary.
The second entity is the ld.so
shared object (also provided by GNU binutils).
This is a shared object that is mapped into a process's address space by the
kernel during a call to exec(2)
. It is what receives initial control from the
kernel when exec(2)
finishes, and identifies libraries needed by the program,
maps them into the address space, and then hands off control to the program,
later being called into from time to time when certain symbolic references need
to be resolved.
To avoid ambiguity, I will from this point forward refer to ld
as the static
linker and ld.so
as the dynamic linker.
A full discussion of the dynamic linker is way beyond the scope of this article,
however it suffices to know that the dynamic linker is what receives initial
control from the Linux kernel, what loads shared libraries required by the
program, and what gets called into when certain symbolic references (eg. a call
to printf
in libc
) need to be resolved at runtime.
So where is the dynamic linker specified in an ELF binary that requires it?
Dynamically linked ELF binaries have a program
header
(INTERP
), specifying the path to the dynamic linker binary to be used. This
can be seen with the readelf
utility (a part of GNU binutils), here showing
/lib64/ld-linux-x86-64.so.2
as the dynamic linker for /bin/ls
on my system.
$ readelf -l /bin/ls
[...]
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R E 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
[...]
Creating a Binary That Doesn't Use the Dynamic Linker
Given the information above, our first task at hand is to compile an ELF binary
with no INTERP
program header. This will result in no dynamic linker being
mapped into our process's address space by the kernel on exec(2)
, thus ensuring
that our program cannot use the dynamic linker at runtime.
Forcing the exclusion of the INTERP
section is very easy with GNU binutils
version 2.26 or later, in which the static linker supports a
--no-dynamic-linker
argument. Excluding the dynamic linker in versions of
binutils before 2.26 seems to only be possible by providing the static linker
with a custom linker
script. Since I'm
using binutils 2.28, I'll just use the --no-dynamic-linker
option.
Now that we know how to compile a binary that doesn't use the dynamic linker, let's go ahead and try to run one compiled in this way. I'll use the following barebones C program as an example:
int main() {
return 0;
}
We can go ahead and compile this with gcc
, ensuring that it passes
--no-dynamic-linker
to the static linker as follows:
$ gcc -Wl,--no-dynamic-linker prog.c -o prog
Let's verify that no INTERP
program header was generated using readelf
:
$ readelf -l prog
Elf file type is DYN (Shared object file)
Entry point 0x271
There are 7 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000350 0x0000000000000350 R E 0x200000
LOAD 0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
0x00000000000000e0 0x00000000000000e0 RW 0x200000
DYNAMIC 0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
0x00000000000000e0 0x00000000000000e0 RW 0x8
NOTE 0x00000000000001c8 0x00000000000001c8 0x00000000000001c8
0x0000000000000024 0x0000000000000024 R 0x4
GNU_EH_FRAME 0x00000000000002b4 0x00000000000002b4 0x00000000000002b4
0x0000000000000024 0x0000000000000024 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
0x00000000000000e0 0x00000000000000e0 R 0x1
[...]
Looks good, let's try to run it:
$ ./prog
Segmentation Fault
Seems like we've got a bit more work to do.
Getting our Non-Dynamically-Linked Binary to Run
So we've discovered that forcing the program to run without the dynamic linker results in a segfault. Let's recompile with debugging symbols and take a closer look at the path of execution with GDB:
$ gdb -g -Wl,--no-dynamic-linker prog.c -o prog
$ gdb -q prog
Reading symbols from prog...done.
(gdb) run
Starting program: /home/rhys/prog
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) backtrace
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff7dfd3ba in _start ()
If you aren't familiar with how the dynamic linker works, this backtrace should
surprise you. Where's the call to main
? And what is this _start
function?
Well, it turns out that the exec(2)
'ing of an ELF binary is a bit more complex
in userspace than main
just receiving control directly from the kernel. As
stated in the background section, in a dynamically linked binary, the kernel
passes initial control to the dynamic linker, which runs initialization code and
then calls main
. For an ELF binary with no INTERP
segment however (as
we're dealing with here), control is initially handed to the function specified
in the ENTRY
command of the linker script being used, which, in the default
linker script on most systems, has the name _start
. This can be seen with the
following command (ld -verbose
dumps the default linker script):
$ ld -verbose | grep ENTRY
ENTRY(_start)
_start
is defined in an object file containing the entry point (and other
bootstrap code) usually called crt0.o,
that is statically linked into the binary by GCC. These "start files" can be
manually excluded from the finished binary using GCC's -nostartfiles
argument.
Since we want our own function to receive control directly from the kernel, we
can pass -nostartfiles
to GCC, and rename main
to _start
, ensuring
that it gets control directly from the kernel when exec(2)
is done.
Here's our new program:
int _start() {
return 0;
}
And our new compilation command:
$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
Let's give this one a shot:
$ ./prog
Segmentation Fault
Another segfault, but this time with a different underlying cause. We can see
the issue at play here by taking a look at the binary's disassembly with
objdump
.
$ objdump -M intel -d prog
prog: file format elf64-x86-64
Disassembly of section .text:
0000000000000233 <_start>:
233: 55 push rbp
234: 48 89 e5 mov rbp,rsp
237: b8 00 00 00 00 mov eax,0x0
23c: 5d pop rbp
23d: c3 ret
Even without much knowledge of x86 assembly, this function is pretty easy to
dissect. Keep in mind the above five instructions are equivalent to a function
containing the single statement return 0;
. The first push
/mov
pair set up
the function's stack frame, mov eax,0x0
sets eax
(the return value register
in the System V ABI) to 0, and the final two instructions restore the base
pointer and return to the function's caller.
The last sentence of that description may have piqued your interest. This
function's "caller" is the kernel, which doesn't push a return address to the
stack before calling _start
like a normal C function. That wouldn't make
sense as userspace code can't just ret
itself into kernel space. This means
that the ret
at the end of this function will grab whatever is on top of the
stack (which definitely isn't a return address) and set rip
to that.
So what address is the final ret
instruction actually returning to? We can
find out with GDB.
$ gdb -q prog
Reading symbols from prog...done.
(gdb) run
Starting program: /home/rhys/prog
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()
It's returning to 0x0000000000000001
, definitely not a valid address. As it
turns out, this is actually the value of argc
. As specified in the System V
ABI, x86-64
Supplement (see
page 29), upon userspace code receiving initial control from the kernel after
exec(2)
, the top eight bytes of the stack represent a little endian integer
specifying the value of argc
.
Fixing this requires making an exit(2)
syscall instead of using a return
statement (which will get compiled into pop rbp
followed by ret
). This can
be done in GNU C using some inline
assembly
to manually perform an exit(2)
syscall.
int _start() {
asm("mov $60, %rax\n" /* Syscall number */
"mov $0, %rdi\n" /* Exit status */
"syscall");
}
Let's recompile and rerun as before:
$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
$ echo $?
0
Success! We've managed to create an ELF binary that doesn't segfault, exits with status 0 and doesn't use the dynamic linker.
Now let's get it to print hello world.
Printing to Standard Output Without libc
As previously stated, the dynamic linker resolves symbolic references to shared library code at runtime. Given this, it should be clear that doing the following will not work if our program doesn't run with the dynamic linker present:
#include <stdio.h>
int _start() {
puts("Hello World!");
asm("mov $60, %rax\n"
"mov $0, %rdi\n"
"syscall");
}
To verify this, let's try to compile and run as before:
$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
Segmentation Fault
What's going on here? Let's take a look at the disassembly:
$ objdump -M intel -d prog
prog: file format elf64-x86-64
Disassembly of section .plt:
00000000000002a0 <.plt>:
2a0: ff 35 62 0d 20 00 push QWORD PTR [rip+0x200d62] # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
2a6: ff 25 64 0d 20 00 jmp QWORD PTR [rip+0x200d64] # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
2ac: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000002b0 <puts@plt>:
2b0: ff 25 62 0d 20 00 jmp QWORD PTR [rip+0x200d62] # 201018 <puts@GLIBC_2.2.5>
2b6: 68 00 00 00 00 push 0x0
2bb: e9 e0 ff ff ff jmp 2a0 <.plt>
Disassembly of section .text:
00000000000002c0 <_start>:
2c0: 55 push rbp
2c1: 48 89 e5 mov rbp,rsp
2c4: 48 8d 3d 18 00 00 00 lea rdi,[rip+0x18] # 2e3 <_start+0x23>
2cb: e8 e0 ff ff ff call 2b0 <puts@plt>
2d0: 48 c7 c0 3c 00 00 00 mov rax,0x3c
2d7: 48 c7 c7 00 00 00 00 mov rdi,0x0
2de: 0f 05 syscall
2e0: 90 nop
2e1: 5d pop rbp
2e2: c3 ret
```
The inclusion of a call to puts
caused two new symbols to be added to the
binary, .plt
and puts@plt
. PLT stands for Procedure Linkage Table, and is a
dynamic linker mechanism that adds a layer of indirection to calls to functions
in external shared objects. A full discussion of how the PLT works is beyond the
scope of this article, however knowing that it works in conjunction with the
dynamic linker and requires it to be present to work properly explains why our
call to puts
is segfaulting.
So how do we print stuff without the use of our nice printf
/puts
abstractions? Under the hood, libc
printing functions just make a write(2)
syscall to standard output (technically it's usually buffered in userspace but
we can ignore this). Because of this, we can use a bit of inline assembly to
create our own primitive print
function in a similar fashion to our exit
code. While we're at it, let's pull said exit code out of _start
and into
another function to make the whole thing cleaner.
void exit_success() {
asm("mov $60, %rax\n" /* Syscall number */
"mov $0, %rdi\n" /* Exit status */
"syscall");
}
void print(const char *s, int count) {
/* sys_write */
asm("mov $1, %%rax\n" /* Syscall Number */
"mov $1, %%edi\n" /* File Descriptor (1/stdout) */
"mov %0, %%rsi\n" /* Buffer */
"mov %1, %%edx\n" /* Bytes to write */
"syscall\n"
"mov %%rax, %0"
:
: "rm" (s), "rm" (count));
}
void _start() {
char string[] = "Hello World!\n";
print(string, sizeof(string));
exit_success();
}
Compiling and running this, we get:
$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
Hello World!
$ echo $?
0
Phew! So there we have it, your standard hello world, in C, with no dynamic linker.
Conclusion
Everything written so far is all well and good, but when would you actually have to work in an environment with no dynamic linker? One possible scenario is working on or reimplementing the dynamic linker itself. Another possibility (and something I happen to be working on at the moment) is writing a binary obfuscator.
There are many ways to obfuscate executable binaries such that they're hard to reverse engineer. One common technique is to encrypt the original binary and add a "stub" loader to it, which takes control directly from the kernel, maps the encrypted binary into memory and decrypts it on the fly during execution. Due to having to take initial control from the kernel, the stub usually runs in an environment without the dynamic linker present, requiring a variety of workarounds similar to the ones shown in this article.
I hope you'll walk away from this article with a heightened sense of respect for the work the dynamic linker does behind the scenes for userspace code. If you like this sort of thing and are interested in reading further on the subject of binary hacking and analysis, I'm currently reading Dennis Andriesse's excellent book on the subject and can highly recommend it.