Rhme 2016 Jumpy - an incentive to extend simavr and angr in avatar2 for automated analysis

The challenge and its firmware could be accessed in this link. Basically, this challenge is all about password guess against authentication. This firmware is built for arduino boards which run on atmega328p processor.

I do know some people solved it with a great amount of reverse engineering effort; some even mystified the use of z3 solver to distract others from the focus on reverse engineering. That is to say, once one understands how this authentication is implemented, writing a z3 script is rather normal and natural.

Manual analysis

Ghidra shit

Honestly, I got stuck in reverse engineering its auth logic at the beginning since pseudo code shown in ghidra’s decompiler for avr8 architecture is very non-understanable. For example, a function involves checking input_10 * input_11 == 0x2873, however, ghidra’s decompilation makes things look so painful. Beware that my ghidra version is ghidra_11.0.2_PUBLIC_20240326.

uint check_10_11(void){
  R20 = PHY_CC_CCA;
  R21 = 0;
  R18 = CCA_THRES;
  R19 = 0;
  R1R0 = (uint)PHY_CC_CCA * (uint)CCA_THRES;
  R25R24._1_1_ = (char)((uint)R1R0 >> 8);
  R25R24._0_1_ = (byte)R1R0;
  R1R0 = 0;
  R25R24._1_1_ = R25R24._1_1_ - (((byte)R25R24 < 0x73) + '(');
  if ((byte)R25R24 == 0x73 && R25R24._1_1_ == 00) {
    R25R24 = CONCAT11(IRQ_STATUS,IRQ_MASK) | 0x800;
    IRQ_STATUS = R25R24._1_1_;
  }
  else {
    IRQ_STATUS = 0;
    IRQ_MASK = 0;
  }
  return R25R24;
}

I don’t know how it looks like in IDA because I have no money on its expensive license. Feel it on your own..

Simavr

Although reverse enginnering all check functions is possible in time, I do not want to sacrifice a great amount of my time that should belong to a cup of coffee and outdoor activities. So, try whatever to get it run. Simavr is an AVR simulator that allows me to run this atmega328p firmware and supports execution trace as well as debugging. In its example folder, its base simulator run_avr has been wrapped around with proper configuration to make an arduino emulator simduino. This configuration involves frequency, processor, and uart pty connection.

user@b09bd05ddefb:~$ ./simavr/examples/board_simduino/obj-x86_64-linux-gnu/simduino.elf -d jumpy/jumpy.hex -t
00ca:                           ldi r25, 0x00
                                       ->> r25=00 
00cc:                           ld r18, (Y+1[030b])=[01]        [Stack]
                                       ->> r18=0d 
00ce:                           movw ZL:ZH, r24:r25[00c6]
                                       ->> ZL=c6 ZH=00 
00d0:                           st (Z+0[00c6]), r18[0d]         io:c6

Another shell for picocom that handles uart communication with the emulated firmware.

./picocom/picocom /tmp/simavr-uart0 --echo --omap crcrlf

Input: 1234

Better luck next time!

Note that I made some changes in simavr/examples/board_simduino/simduino.c to enable instruction trace.

        // even if not setup at startup, activate gdb if crashing
        avr->gdb_port = 1234;
        if (debug) {
                avr->state = cpu_Stopped;
                avr_gdb_init(avr);
        }

        // enable instruction trace
        if (trace)
                avr->trace = 1;

However, I can’t figure out execution flow, e.g., which check function is executed first since the generated log is so messy and length and makes it so hard to determine the order of two adjacent check functions.

Debugging with simavr

Fortunately, simavr offers users a gdb debug portal. I located user inputs in memory and set up watchpoints on them.

Type [C-a] [C-h] to see available commands
Terminal ready
Input: whereismyinput

Turn to gdb and take a look at memory layout. (use info mem because info proc mappings is not supported in this target/architecture). Print all strings in the sram section then. Note that avr-gdb intentionally places sram at 0x8000000. Find more details in this link.

  AVR_IMEM_START = 0x00000000,	/* INSN memory */
  AVR_SMEM_START = 0x00800000,	/* SRAM memory */

Now connect to gdbserver and check basic info:

(gdb) target remote localhost:1234
(gdb) c
Continuing.
(gdb) info mem
Using memory regions provided by the target.
Num Enb Low Addr   High Addr  Attrs 
0   y  	0x00000000 0x00008000 flash blocksize 0x80 nocache 
1   y  	0x00800000 0x00800900 rw nocache
(gdb) find /b 0x800100, +1000, 0x77, 0x68, 0x65, 0x72
0x80013e
(gdb) x/10s 0x800100
0x800100:	"\r\nFLAG:D0_you_3ven_ROP?"
0x800118:	"\r\n"
0x80011b:	"\r\nBetter luck next time!\r\n"
0x800136:	"Input: "
0x80013e:	"whereismyinput"

Put watchpoints on those addresses and continue execution. I only present parts of logs due to too many hittings:

info b
Num     Type            Disp Enb Address    What
2       read watchpoint keep y              *(char*)0x80013e
	breakpoint already hit 1 time
3       read watchpoint keep y              *(char*)(0x80013e+1)
	breakpoint already hit 1 time
4       read watchpoint keep y              *(char*)(0x80013e+2)
	breakpoint already hit 1 time
5       read watchpoint keep y              *(char*)(0x80013e+3)
	breakpoint already hit 1 time
6       read watchpoint keep y              *(char*)(0x80013e+4)
	breakpoint already hit 1 time
7       read watchpoint keep y              *(char*)(0x80013e+5)
	breakpoint already hit 1 time
8       read watchpoint keep y              *(char*)(0x80013e+6)
	breakpoint already hit 1 time
9       read watchpoint keep y              *(char*)(0x80013e+7)
	breakpoint already hit 1 time
10      read watchpoint keep y              *(char*)(0x80013e+8)
	breakpoint already hit 1 time
11      read watchpoint keep y              *(char*)(0x80013e+9)
	breakpoint already hit 1 time
12      read watchpoint keep y              *(char*)(0x80013e+10)
	breakpoint already hit 1 time
13      read watchpoint keep y              *(char*)(0x80013e+11)
	breakpoint already hit 1 time
14      read watchpoint keep y              *(char*)(0x80013e+12)
	breakpoint already hit 1 time

Value = 49 '1'
0x00000326 in ?? ()
(gdb) info r $pc
pc             0x193               0x326
(gdb) c
Continuing.

Hardware read watchpoint 3: *(char*)(0x80013e+1)

Value = 50 '2'
0x00000326 in ?? ()
(gdb) info r $pc
pc             0x193               0x326
(gdb) c
Continuing.

Hardware read watchpoint 4: *(char*)(0x80013e+2)

Value = 52 '4'
0x00000326 in ?? ()
(gdb) info r $pc
pc             0x193               0x326
(gdb) c
Continuing.

From now on, I could easily idenfity which function deals with some specific characters. Basically, this authentication will first check the length of user inputs. If this is satisfied, then run them through further check functions. Overall, the analyzed ghidra zip file is here.

Write a z3 script

I demonstrate my exploit script straightaway because there is no point to discuss those detals anymore and this post aims for automated analysis. Since I already know the required length of user inputs is 13, I create 13 symbolic variables and apply constraints resolved from each check function on them.

import z3
from z3 import *

input_0 = Real("0")
input_1 = Real("1")
input_2 = Real("2")
input_3 = Real("3")
input_4 = Real("4")
input_5 = Real("5")
input_6 = Real("6")
input_7 = Real("7")
input_8 = Real("8")
input_9 = Real("9")
input_10 = Real("10")
input_11 = Real("11")
input_12 = Real("12")
s = Solver()
s.add(input_0 * input_1 == 0x13b7, input_0 > 0, input_1 > 0)
s.add(input_1 + input_2 == 0xa7, input_1 > 0, input_2 > 0)
s.add(input_2 * input_3 == 0x1782, input_2 >0 , input_3 > 0)
s.add(input_3 + input_4 == 0x92, input_3 > 0, input_4 > 0)
s.add(input_4 * input_5 == 0x122f, input_4 > 0, input_5 > 0)
s.add(input_5 + input_6 == 0xa5, input_5 > 0, input_6 > 0)
s.add(input_6 * input_7 == 0x2b0c, input_6 > 0, input_7 > 0)
s.add(input_7 + input_8 == 0xd3, input_7 > 0, input_8 > 0)
s.add(input_8 * input_9 == 0x15c0, input_8 > 0, input_9 > 0)
s.add(input_9 + input_10 == 0x8f, input_9 > 0, input_10 > 0)
s.add(input_10 * input_11 == 0x2873, input_10 > 0, input_11 > 0)
s.add(input_11 + input_12 == 0xa0, input_11 > 0, input_12 > 0)
s.add(0xd * input_12 == 0x297, input_12 > 0)
if s.check() == sat:
    m = s.model()
    flag = []
    for i in m:
        flag.append(chr(m[i].as_long()))
    flag.reverse()
    print("".join(flag))

Automated analysis

A long journey to cretea an avatar’s target for simduino

The terminology target in avatar2 is the topmost unit that users use to manipulate targets, such as hardware, emulator, and analysis framework. Creating a new target requires supports of protocols such as gdb protocol, openocd protocol, qmp protocol. Those protocols are designed to be decoupled from target code space in order to keep avatar’s code slim and well-structured. Moreover, protocols, as we have seen similar concepts in network programming, are used to establish connections with endpoints, which ultimately refer to actual targets as mentioned earlier.

In this case, building a new target for simduino actually underwent a few attempts. At the very beginning, I was planning to incorporate run_avr this base simulator into the new target because it offers more configuration, such that this target could be compatible with other avr8-based processors. However, I realized that dealing with uart pty connection needs to be done programatically by using uart pty interfaces while extending from run_avr is pretty cubersome even though I think it is doable. I do not have too much ambition on this project. Much easier a solution is, much sonner I start working on analysis. Thus, the straightaway solution is to incorporate simduino into a target and I only need to create a file that stores register profiles used for gdb protocol. Moreover, gdb protocol has been supported already, this is why I just need to import register profile to enable programatic control over gdb in avatar2.

Adding AVR as a new arch class of avatar2

Since avr-gdb uses a pesudo PC register, looking for its register profile is kind of confusing. Furthermore, I am not sure if the index of a register conforms what I see in gdb logs. I did refer to angr that has already implemented avr8 at the link. Howeverr, I got more confused when noticing the index of pc as well as no pesudo PC register.

(gdb) info r
...
r31            0x0                 0
SREG           0x21                33
SP             0x30a               0x80030a
PC2            0xd2                210
pc             0x69                0xd2

Clearly, I see two pc registers in gdb. The address in the pesudo register pc refers to file offset while that in PC2 indicates loading offsets.

After researching the gdb implementation of simavr, I got some useful information:

static int
gdb_write_register(
		avr_gdb_t * g,
		int regi,
		uint8_t * src )
{
	switch (regi) {
		case 0 ... 31:
			g->avr->data[regi] = *src;
			return 1;
		case 32:
			g->avr->data[R_SREG] = *src;
			SET_SREG_FROM(g->avr, *src);
			return 1;
		case 33:
			g->avr->data[R_SPL] = src[0];
			g->avr->data[R_SPH] = src[1];
			return 2;
		case 34:
			g->avr->pc = src[0] | (src[1] << 8) | (src[2] << 16) | (src[3] << 24);
			return 4;
	}
	return 1;
}

It is interesting that index 34 points to PC2 but there is nothing about the pesudo register pc in simavr’s gdb implementation. However, I realize that avr-gdb maintains this pesudo register on it own somehow rather than acquires values from simavr. Nevertheless, this is just my assumption since I do not have so much time on researching gdb internals.

Hence, following the order of occurance of registers presented by command info r, the index of pc is 35. I already validated the index in avatar2 and successfully read its corresponding value.

Building simduino target upon gdb protocol

Considering what this class needs to do during initialization? It only needs to Popen a simduino process with a gdb port open and connects gdb protocol to this port. gdb executable in this case should be avr-gdb because gdb-multiarch does not support this architecture. Finally, the target class looks like:

from subprocess import Popen
from distutils.spawn import find_executable as find
from avatar2.protocols.gdb import GDBProtocol
from avatar2.targets import Target, TargetStates
from .target import action_valid_decorator_factory, synchronize_state
from ..watchmen import watch


class SimduinoTarget(Target):
    def __init__(
            self,
            avatar,
            simduini_executable,
            avr_gdb_executable,
            firmware,
            gdb_additional_args=None,
            gdb_verbose=False,
            **kwargs
    ):
        super(SimduinoTarget, self).__init__(avatar, **kwargs)
        self.fw = firmware
        self.gdb_port = 1234
        self.gdb_verbose = gdb_verbose
        self.gdb_additional_args = gdb_additional_args if gdb_additional_args else []
        if simduini_executable is not None:
            isExist = find(simduini_executable)
        else:
            raise Exception("Simduino executable is not specified")

        if not isExist:
            raise Exception("Simduino executable is not found")

        self.sim_executable = simduini_executable

        if avr_gdb_executable is not None:
            isExist = find(avr_gdb_executable)
        else:
            raise Exception("avr-gdb is not specified")

        if not isExist:
            raise Exception("avr-gdb is not found")

        self.avr_gdb_executable = avr_gdb_executable
    
    def shutdown(self):
        if self._process is not None:
            self._process.terminate()
            self._process.wait()
            self._process = None
        super(SimduinoTarget, self).shutdown()

    def init(self):
        cmd_line = [self.sim_executable, "-d", self.fw]
        with open(
                "%s/%s_out.txt" % (self.avatar.output_directory, self.name), "wb"
        ) as out, open(
            "%s/%s_err.txt" % (self.avatar.output_directory, self.name), "wb"
        ) as err:
            self.log.debug("output directory: %s/%s_out.txt" % (self.avatar.output_directory, self.name))
            self._process = Popen(cmd_line, stdout=out, stderr=err)

        self.log.debug("Simduino command line: %s" % " ".join(cmd_line))
        self.log.info("Simduino process running")
        self._connect_protocols()

    def _connect_protocols(self):
        gdb = GDBProtocol(
            gdb_executable=self.avr_gdb_executable,
            arch=self.avatar.arch,
            verbose=self.gdb_verbose,
            additional_args=self.gdb_additional_args,
            avatar=self.avatar,
            origin=self,
        )

        self.protocols.set_all(gdb)

        connect_success = gdb.remote_connect(port=self.gdb_port)

        if connect_success:
            self.log.info("Connected to remote target")
        else:
            self.log.warning("Connection to remote target failed")

        self.wait()

    @watch('TargetCont')
    @action_valid_decorator_factory(TargetStates.INITIALIZED, 'execution')
    @synchronize_state(TargetStates.RUNNING)
    def run(self, blocking=True):
        return self.protocols.execution.run()

    def cont(self, blocking=True):
        if self.state != TargetStates.INITIALIZED:
            super(SimduinoTarget, self).cont(blocking=blocking)
        else:
            self.run()

The following code snippet is to create avr arch class:

from avatar2.installer.config import GDB_MULTI
from .architecture import Architecture

class AVR(Architecture):
    get_gdb_executable = Architecture.resolve(GDB_MULTI)

    gdb_name = 'avr'
    endian = 'little'

    registers = { }

    registers.update({'r%d' % i: i for i in range(0, 32)})

    pc2_name = 'PC2' # load offset
    pc_name = 'pc' # file offset
    sr_name = 'SREG'

    registers.update({'': 32})
    registers.update({'SP': 33})
    registers.update({'%s' % pc2_name: 34})
    registers.update({'%s' % pc_name: 35})

Run sudo python3 setup.py install in the topmost folder of avatar2.

Found limitation of simavr and avr-gdb while testing the built target

I manually debugged and confirmed that CPU states are the same between the direct use of avr-gdb and the new target. Moreover, I Popened picocom to establish an uart pty connection with simduino target to provide inputs and succeeded in locating the address of user inputs, which is the same as I found in avr-gdb. Notably, I make use of this address to leverage symbolic execution by setting symbolic memory range properly.

Furthermore, in order to gain more information about memory layout, I extended an avatar’s built-in gdb plugin load_memory_mappings which in turn could resolve memory rannges from info mem in avr-gdb. Those information could be useful when initializing an angr target later:

def load_memory_mappings(avatar, target, forward=False, update=True):
    if not isinstance(target, GDBTarget) and not isinstance(target, SimduinoTarget):
        raise TypeError("The memory mapping can be loaded ony from GDBTargets or SimduinoTarget")
    
    # ...

    if avatar.arch == AVR:
        raw_data = resp.strip()
        mappings = []
        avatar.log.debug(raw_data.encode())
        while raw_data.count("\n\n") >= 2:
            x = raw_data[raw_data.rfind("\n\n"):].split()

            if int(x[2], 16) == 0x0:
                obj = "flash"
            elif int(x[2], 16) == 0x800000:
                obj = "sram"
            else:
                obj = "unknown"
            
            mappings.append(
                {
                    "start": int(x[2], 16),
                    "end": int(x[3], 16),
                    "size": int(x[3],16)-int(x[2],16),
                    "obj": obj
                }
            )
            raw_data = raw_data[:raw_data.rfind("\n\n")-2]
    else:
        lines = resp.split("objfile")[-1].split("\n")
        mappings = [
            {
                "start": int(x[0], 16),
                "end": int(x[1], 16),
                "size": int(x[2], 16),
                "offset": int(x[3], 16),
                "obj": x[4],
            }
            for x in [y.split() for y in lines if y != ""]
        ]
    
    # ...

Furthermore, I used this newly built target to write a script that locates the address of user inputs:

from avatar2 import *
import logging
from pwn import *

picocom_cmdline = ["/home/user/picocom/picocom", "/tmp/simavr-uart0", "--echo",  "--omap", "crcrlf"]

binary = "/home/user/jumpy/jumpy.hex"

avatar = Avatar(arch=AVR, output_directory="/tmp/jumpy")

gdb = avatar.add_target(SimduinoTarget,
                        simduini_executable="/home/user/simavr/examples/board_simduino/obj-x86_64-linux-gnu/simduino.elf",
                        avr_gdb_executable="avr-gdb",
                        firmware=binary
                        )

avatar.init_targets()

avatar.load_plugin('gdb_memory_map_loader')

mem_ranges = gdb.load_memory_mappings(update=True)

for interval_obj in mem_ranges.items():
    print(interval_obj.data.name)

picocom = process(picocom_cmdline, stdin=PIPE)

gdb.set_breakpoint(0x780)

cur_input = b"1234567"

gdb.cont()

gdb.wait()

print( "PC2: 0x%x" % gdb.read_register("PC2"))
print( "pc: 0x%x" % gdb.read_register("pc"))

serialization = lambda data: "".join([chr(i) for i in data if i != 0])
res = serialization(gdb.rm(0x80013e, 1, 16))
print(res, " Passed: %r" % (res == cur_input.decode()))

sram_base_addr = 0x800000

sram = bytes(gdb.rm(sram_base_addr, 1, 0x900))

print("Found input located at offset 0x%x" % (sram_base_addr + sram.find(cur_input)))

avatar.shutdown()
picocom.kill()

However, it is very annoying to use picocom this way and I am thinking of handling uart output by monitoring writes to its data register UDR0. I configured a write watchpoint on this register in a script but there was no watchpoint hitting. The assembly snippets of uart_send as following is about to read a charcter from r18 and send to picocom.

code:000068 20 83           st         Z,R18=>UDR0
code:000069 0f 90           pop        R0

Instead, I manually set up breakpoints at where an incoming write to UDR0 is about to take place, and confirmed the address of Z register (a pair of r31 and r30) points to that of UDR0. I was stuck for a while. With those doubts, I set up other two watchpoints at 0x8000c6 and 0xc6, respectively. Ran it again and still found no hitting. I modified the value of r31 and r30, forcing the memory write to somewhere else and setting a watchpoint beforehands. Finally, the watchpoint got triggered. Hence, if a write takes place at register, watchpoints will not be taken over and dealt with.

(gdb) set $r31=0x1
(gdb) set $r30=0x42
(gdb) info r $r31 $r30
r31            0x1                 1
r30            0x42                66
(gdb) info r $r18
r18            0x6c                108
(gdb) watch *0x800142
Hardware watchpoint 7: *0x800142
(gdb) x/2i $pc
=> 0xd0:	st	Z, r18
   0xd2:	pop	r0
(gdb) si

Hardware watchpoint 7: *0x800142

Old value = 0
New value = 108

I had started guessing whether this issue comes from the implementation of memory read in simavr. By checking its code, I finally found an answer.

static inline void _avr_set_ram(avr_t * avr, uint16_t addr, uint8_t v)
{
	if (addr <= avr->ioend)
		_avr_set_r(avr, addr, v);
	else
		avr_core_watch_write(avr, addr, v);
}


void avr_core_watch_write(avr_t *avr, uint16_t addr, uint8_t v)
{
	...
	if (avr->gdb) {
		avr_gdb_handle_watchpoints(avr, addr, AVR_GDB_WATCH_WRITE);
	}

	avr->data[addr] = v;
    ...
}

avr_flashaddr_t avr_run_one(avr_t * avr)
{
    ...
    switch (opcode & 0xd008) {
        case 0xa000:
        case 0x8000: {	// LD (LDD) -- Load Indirect using Z -- 10q0 qqsd dddd yqqq
            uint16_t v = avr->data[R_ZL] | (avr->data[R_ZH] << 8);
            get_d5_q6(opcode);
            if (opcode & 0x0200) {
                STATE("st (Z+%d[%04x]), %s[%02x]  \t%s\n",
                        q, v+q, AVR_REGNAME(d), avr->data[d], DAS(v + q));
                
                // actual memory access happens here
                _avr_set_ram(avr, v+q, avr->data[d]);
    ...            
    

As above code snippets shown, st instruction will finally call _avr_set_ram to perform memory access, which either invokes _avr_set_r if the address points to any of I/O registers or invokes avr_core_watch_write if the address is in interal SRAM (from 0x800100). Clearly, only avr_core_watch_write handles gdb watchpoints. Of course, it won’t be so hard to implement this handling in _avr_set_r.

static inline void _avr_set_r(avr_t * avr, uint16_t r, uint8_t v)
{
    ...
    if (avr->gdb) {
		avr_gdb_handle_watchpoints(avr, r, AVR_GDB_WATCH_WRITE);
	}
}

Compile simavr and see if it works.

0x00000000 in ?? ()
(gdb) info b
Num     Type           Disp Enb Address    What
4       breakpoint     keep y   0x000000d0 
	breakpoint already hit 34 times
5       hw watchpoint  keep y              *0x8000c6
	breakpoint already hit 2 times
6       hw watchpoint  keep y              *0xc6
(gdb) c
Continuing.

Hardware watchpoint 5: *0x8000c6

Old value = 0
New value = 13
0x000000d2 in ?? ()

Hmmm, it is still weird that this watchpoint only happened once. It did not make sense because there are quite a few characters to be printed. It is supposed to take place a few times. For validation, I modified Z register as 0x145, and it hit everytime.

I even enabled simavr’s logging in a verbose level and its gdb module also sent a correct command to avr-gdb which had no reaction.

The following is logs from simavr:

Addr 00c6 found watchpoint 0 size 2 type 4 wanted 4                                                                                                      
gdb_send_reply '$T0520:21;21:0a03;22:d0000000;watch:8000c6;#39'                                                                                          
gdb_send_reply '$T0520:21;21:0a03;22:d2000000;hwbreak:;#a7'

Hence, my educated guess is on avr-gdb, which maybe conducts lazy handling of I/O registers because access to them is very intensive and frequent throughout the entire execution. Nonetheless, I do not wanna spend more of time on just handling UART output. So, just keep using picocom. Few more words, I feel like that sim-avr is a good simulator, but it does not offer a good support in instrumentation (or called monitoring).

Adding angr target

TODO