Writing Hello World in C Without the Dynamic Linker

Posted on: Tue 03 September 2019

As C/C++/Rust/OCaml/(insert other language usually compiled to native code here) programmers, when writing code targeting Linux, our toolchains usually produce as final output, a dynamically linked binary in the ELF format. While this covers almost all use cases when writing code meant to run in userspace, there are some circumstances in which it is necessary to write code that runs without the dynamic linker present.

In this article, we'll try to write a standard hello world program in C for an x86-64 Linux system, but with the catch that our compiled ELF binary can't make use of the runtime dynamic linker. As will be seen, this entails a range of interesting challenges not normally encountered when writing userspace code.

Background

There are two related but distinct programs commonly referred to as "the linker" in the context of ELF binaries, one of which operates at compile time and one of which operates at runtime.

The first of these two programs is ld, provided by GNU binutils, which is responsible for resolving cross-file and dynamic library references in object files at compile time and "linking" them together with these references resolved to produce a single executable binary or shared object. In most circumstances, ld is invoked under the hood when running gcc as the last step of building a binary.

The second entity is the ld.so shared object (also provided by GNU binutils). This is a shared object that is mapped into a process's address space by the kernel during a call to exec(2). It is what receives initial control from the kernel when exec(2) finishes, and identifies libraries needed by the program, maps them into the address space, and then hands off control to the program, later being called into from time to time when certain symbolic references need to be resolved.

To avoid ambiguity, I will from this point forward refer to ld as the static linker and ld.so as the dynamic linker.

A full discussion of the dynamic linker is way beyond the scope of this article, however it suffices to know that the dynamic linker is what receives initial control from the Linux kernel, what loads shared libraries required by the program, and what gets called into when certain symbolic references (eg. a call to printf in libc) need to be resolved at runtime.

So where is the dynamic linker specified in an ELF binary that requires it? Dynamically linked ELF binaries have a program header (INTERP), specifying the path to the dynamic linker binary to be used. This can be seen with the readelf utility (a part of GNU binutils), here showing /lib64/ld-linux-x86-64.so.2 as the dynamic linker for /bin/ls on my system.

$ readelf -l /bin/ls
[...]
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R E    0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
[...]

Creating a Binary That Doesn't Use the Dynamic Linker

Given the information above, our first task at hand is to compile an ELF binary with no INTERP program header. This will result in no dynamic linker being mapped into our process's address space by the kernel on exec(2), thus ensuring that our program cannot use the dynamic linker at runtime.

Forcing the exclusion of the INTERP section is very easy with GNU binutils version 2.26 or later, in which the static linker supports a --no-dynamic-linker argument. Excluding the dynamic linker in versions of binutils before 2.26 seems to only be possible by providing the static linker with a custom linker script. Since I'm using binutils 2.28, I'll just use the --no-dynamic-linker option.

Now that we know how to compile a binary that doesn't use the dynamic linker, let's go ahead and try to run one compiled in this way. I'll use the following barebones C program as an example:

int main() {
  return 0;
}

We can go ahead and compile this with gcc, ensuring that it passes --no-dynamic-linker to the static linker as follows:

$ gcc -Wl,--no-dynamic-linker prog.c -o prog

Let's verify that no INTERP program header was generated using readelf:

$ readelf -l prog

Elf file type is DYN (Shared object file)
Entry point 0x271
There are 7 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000350 0x0000000000000350  R E    0x200000
  LOAD           0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
                 0x00000000000000e0 0x00000000000000e0  RW     0x200000
  DYNAMIC        0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
                 0x00000000000000e0 0x00000000000000e0  RW     0x8
  NOTE           0x00000000000001c8 0x00000000000001c8 0x00000000000001c8
                 0x0000000000000024 0x0000000000000024  R      0x4
  GNU_EH_FRAME   0x00000000000002b4 0x00000000000002b4 0x00000000000002b4
                 0x0000000000000024 0x0000000000000024  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000000f20 0x0000000000200f20 0x0000000000200f20
                 0x00000000000000e0 0x00000000000000e0  R      0x1
[...]

Looks good, let's try to run it:

$ ./prog
Segmentation Fault

Seems like we've got a bit more work to do.

Getting our Non-Dynamically-Linked Binary to Run

So we've discovered that forcing the program to run without the dynamic linker results in a segfault. Let's recompile with debugging symbols and take a closer look at the path of execution with GDB:

$ gdb -g -Wl,--no-dynamic-linker prog.c -o prog
$ gdb -q prog
Reading symbols from prog...done.
(gdb) run
Starting program: /home/rhys/prog

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) backtrace
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7dfd3ba in _start ()

If you aren't familiar with how the dynamic linker works, this backtrace should surprise you. Where's the call to main? And what is this _start function?

Well, it turns out that the exec(2)'ing of an ELF binary is a bit more complex in userspace than main just receiving control directly from the kernel. As stated in the background section, in a dynamically linked binary, the kernel passes initial control to the dynamic linker, which runs initialization code and then calls main. For an ELF binary with no INTERP segment however (as we're dealing with here), control is initially handed to the function specified in the ENTRY command of the linker script being used, which, in the default linker script on most systems, has the name _start. This can be seen with the following command (ld -verbose dumps the default linker script):

$ ld -verbose | grep ENTRY
ENTRY(_start)

_start is defined in an object file containing the entry point (and other bootstrap code) usually called crt0.o, that is statically linked into the binary by GCC. These "start files" can be manually excluded from the finished binary using GCC's -nostartfiles argument.

Since we want our own function to receive control directly from the kernel, we can pass -nostartfiles to GCC, and rename main to _start, ensuring that it gets control directly from the kernel when exec(2) is done.

Here's our new program:

int _start() {
  return 0;
}

And our new compilation command:

$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog

Let's give this one a shot:

$ ./prog
Segmentation Fault

Another segfault, but this time with a different underlying cause. We can see the issue at play here by taking a look at the binary's disassembly with objdump.

$ objdump -M intel -d prog

prog:     file format elf64-x86-64


Disassembly of section .text:

0000000000000233 <_start>:
 233:   55                      push   rbp
 234:   48 89 e5                mov    rbp,rsp
 237:   b8 00 00 00 00          mov    eax,0x0
 23c:   5d                      pop    rbp
 23d:   c3                      ret

Even without much knowledge of x86 assembly, this function is pretty easy to dissect. Keep in mind the above five instructions are equivalent to a function containing the single statement return 0;. The first push/mov pair set up the function's stack frame, mov eax,0x0 sets eax (the return value register in the System V ABI) to 0, and the final two instructions restore the base pointer and return to the function's caller.

The last sentence of that description may have piqued your interest. This function's "caller" is the kernel, which doesn't push a return address to the stack before calling _start like a normal C function. That wouldn't make sense as userspace code can't just ret itself into kernel space. This means that the ret at the end of this function will grab whatever is on top of the stack (which definitely isn't a return address) and set rip to that.

So what address is the final ret instruction actually returning to? We can find out with GDB.

$ gdb -q prog
Reading symbols from prog...done.
(gdb) run
Starting program: /home/rhys/prog

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()

It's returning to 0x0000000000000001, definitely not a valid address. As it turns out, this is actually the value of argc. As specified in the System V ABI, x86-64 Supplement (see page 29), upon userspace code receiving initial control from the kernel after exec(2), the top eight bytes of the stack represent a little endian integer specifying the value of argc.

Fixing this requires making an exit(2) syscall instead of using a return statement (which will get compiled into pop rbp followed by ret). This can be done in GNU C using some inline assembly to manually perform an exit(2) syscall.

int _start() {
      asm("mov $60, %rax\n" /* Syscall number */
          "mov $0, %rdi\n"  /* Exit status */
          "syscall");
}

Let's recompile and rerun as before:

$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
$ echo $?
0

Success! We've managed to create an ELF binary that doesn't segfault, exits with status 0 and doesn't use the dynamic linker.

Now let's get it to print hello world.

Printing to Standard Output Without `libc`

As previously stated, the dynamic linker resolves symbolic references to shared library code at runtime. Given this, it should be clear that doing the following will not work if our program doesn't run with the dynamic linker present:

#include <stdio.h>

int _start() {
  puts("Hello World!");
  asm("mov $60, %rax\n"
      "mov $0, %rdi\n"
      "syscall");
}

To verify this, let's try to compile and run as before:

$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
Segmentation Fault

What's going on here? Let's take a look at the disassembly:

$ objdump -M intel -d prog

prog:     file format elf64-x86-64


Disassembly of section .plt:

00000000000002a0 <.plt>:
 2a0:   ff 35 62 0d 20 00       push   QWORD PTR [rip+0x200d62]        # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
 2a6:   ff 25 64 0d 20 00       jmp    QWORD PTR [rip+0x200d64]        # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
 2ac:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]

00000000000002b0 <puts@plt>:
 2b0:   ff 25 62 0d 20 00       jmp    QWORD PTR [rip+0x200d62]        # 201018 <puts@GLIBC_2.2.5>
 2b6:   68 00 00 00 00          push   0x0
 2bb:   e9 e0 ff ff ff          jmp    2a0 <.plt>

Disassembly of section .text:

00000000000002c0 <_start>:
 2c0:   55                      push   rbp
 2c1:   48 89 e5                mov    rbp,rsp
 2c4:   48 8d 3d 18 00 00 00    lea    rdi,[rip+0x18]        # 2e3 <_start+0x23>
 2cb:   e8 e0 ff ff ff          call   2b0 <puts@plt>
 2d0:   48 c7 c0 3c 00 00 00    mov    rax,0x3c
 2d7:   48 c7 c7 00 00 00 00    mov    rdi,0x0
 2de:   0f 05                   syscall
 2e0:   90                      nop
 2e1:   5d                      pop    rbp
 2e2:   c3                      ret
```

The inclusion of a call to puts caused two new symbols to be added to the binary, .plt and puts@plt. PLT stands for Procedure Linkage Table, and is a dynamic linker mechanism that adds a layer of indirection to calls to functions in external shared objects. A full discussion of how the PLT works is beyond the scope of this article, however knowing that it works in conjunction with the dynamic linker and requires it to be present to work properly explains why our call to puts is segfaulting.

So how do we print stuff without the use of our nice printf/puts abstractions? Under the hood, libc printing functions just make a write(2) syscall to standard output (technically it's usually buffered in userspace but we can ignore this). Because of this, we can use a bit of inline assembly to create our own primitive print function in a similar fashion to our exit code. While we're at it, let's pull said exit code out of _start and into another function to make the whole thing cleaner.

void exit_success() {
  asm("mov $60, %rax\n" /* Syscall number */
      "mov $0, %rdi\n"  /* Exit status */
      "syscall");
}

void print(const char *s, int count) {
  /* sys_write */
  asm("mov $1, %%rax\n" /* Syscall Number */
      "mov $1, %%edi\n" /* File Descriptor (1/stdout) */
      "mov %0, %%rsi\n" /* Buffer */
      "mov %1, %%edx\n" /* Bytes to write */
      "syscall\n"
      "mov %%rax, %0"
  :
  :   "rm" (s), "rm" (count));
}

void _start() {
  char string[] = "Hello World!\n";
  print(string, sizeof(string));
  exit_success();
}

Compiling and running this, we get:

$ gcc -Wl,--no-dynamic-linker -nostartfiles prog.c -o prog
$ ./prog
Hello World!
$ echo $?
0

Phew! So there we have it, your standard hello world, in C, with no dynamic linker.

Conclusion

Everything written so far is all well and good, but when would you actually have to work in an environment with no dynamic linker? One possible scenario is working on or reimplementing the dynamic linker itself. Another possibility (and something I happen to be working on at the moment) is writing a binary obfuscator.

There are many ways to obfuscate executable binaries such that they're hard to reverse engineer. One common technique is to encrypt the original binary and add a "stub" loader to it, which takes control directly from the kernel, maps the encrypted binary into memory and decrypts it on the fly during execution. Due to having to take initial control from the kernel, the stub usually runs in an environment without the dynamic linker present, requiring a variety of workarounds similar to the ones shown in this article.

I hope you'll walk away from this article with a heightened sense of respect for the work the dynamic linker does behind the scenes for userspace code. If you like this sort of thing and are interested in reading further on the subject of binary hacking and analysis, I'm currently reading Dennis Andriesse's excellent book on the subject and can highly recommend it.