An incentive to develop my own Ghidra utilities

Ghidra, a renowned reverse engineering (RE) tool, has been widely used by many security engineers. It features a wide range of static and dynmaic techniques, facilitating in-depth and customized analysis. Although Ghidra is built upon Java, there are indeed a lot of Python scripts written and provided by users on Github or personal websites, nonetheless, I intentionally find it very difficult to make the best of them. For instance, some scripts may not work out with the latest version of Ghidra for interface compatability. Hence, I am planning to start building my own Ghidra scripting utility base and growing it along with reversing complex binary or solving a CTF challenge. Additionally, I do see Ghidra’s value in program analysis for automation, linking in-depth analysis to the level of maturity of its analysis engine. So, developing necessary utilities for scripting could a very good starting point roughly.

Other than that, this post depicts a writeup, by solving a classical RE challenge, which offers an ARM disassembly dump exported from IDA in the form of text. Players could easily solve this challenge either by understanding what these ARM assembly code snippets do, or load opcodes to any of RE tools for analysis. I go for the second approach mainly because not everyone knows ARM assembly. Even some could do so, there are quite a few exotic ISA, posing a huge challenge to figure things out in a short term.

IDA’s LST file

After unzip the attachment, this seems like an IDA LST dump that is written with the disassembled text as following:

HEADER:0000000100000000                               ;
HEADER:0000000100000000                               ; +-------------------------------------------------------------------------+
HEADER:0000000100000000                               ; |      This file was generated by The  Interactive Disassembler (IDA)      |
HEADER:0000000100000000                               ; |           Copyright (c) 2022 Hex-Rays, <support@hex-rays.com>           |
HEADER:0000000100000000                               ; +-------------------------------------------------------------------------+
HEADER:0000000100000000                               ;
HEADER:0000000100000000                               ; Input SHA256 : 0D2B2F0E5D3BF95B7FDCC3BD465B0D34C9597AF3DDE635839866483EEFE43F9B
HEADER:0000000100000000                               ; Input MD5    : 4A2234EBDE251890B142C57DA8754FA5
HEADER:0000000100000000                               ; Input CRC32  : 1B042DAF
HEADER:0000000100000000
HEADER:0000000100000000
HEADER:0000000100000000                               ; Processor       : ARM
HEADER:0000000100000000                               ; ARM architecture: metaarm
HEADER:0000000100000000                               ; Target assembler: Generic assembler for ARM
HEADER:0000000100000000                               ; Byte sex        : Little endian
HEADER:0000000100000000
HEADER:0000000100000000                               ; ===========================================================================
HEADER:0000000100000000
HEADER:0000000100000000                               ; [00003BBC BYTES: COLLAPSED SEGMENT HEADER. PRESS CTRL-NUMPAD+ TO EXPAND]
__text:0000000100003BBC                               ; ===========================================================================
__text:0000000100003BBC
__text:0000000100003BBC                               ; Segment type: Pure code
__text:0000000100003BBC                               AREA __text, CODE
__text:0000000100003BBC                               ; ORG 0x100003BBC
__text:0000000100003BBC                               CODE64
__text:0000000100003BBC
__text:0000000100003BBC                               ; =============== S U B R O U T I N E =======================================
__text:0000000100003BBC
__text:0000000100003BBC                               ; Attributes: bp-based frame
__text:0000000100003BBC
__text:0000000100003BBC                               ; int __cdecl main(int argc, const char **argv, const char **envp)
__text:0000000100003BBC                               EXPORT _main
__text:0000000100003BBC                               _main
__text:0000000100003BBC
__text:0000000100003BBC                               anonymous_0= -0x108
__text:0000000100003BBC                               var_100= -0x100
__text:0000000100003BBC                               var_F8= -0xF8
...
__text:0000000100003BBC
__text:0000000100003BBC FC 6F BE A9                   STP             X28, X27, [SP,#-0x10+var_10]!
__text:0000000100003BC0 FD 7B 01 A9                   STP             X29, X30, [SP,#0x10+var_s0]
__text:0000000100003BC4 FD 43 00 91                   ADD             X29, SP, #0x10
__text:0000000100003BC8 FF 03 04 D1                   SUB             SP, SP, #0x100
__text:0000000100003BCC 08 00 00 B0                   ADRP            X8, #___stack_chk_guard_ptr@PAGE
__text:0000000100003BD0 08 09 40 F9                   LDR             X8, [X8,#___stack_chk_guard_ptr@PAGEOFF]
__text:0000000100003BD4 08 01 40 F9                   LDR             X8, [X8]
__text:0000000100003BD8 A8 83 1E F8                   STUR            X8, [X29,#var_18]
__text:0000000100003BDC BF C3 14 B8                   STUR            WZR, [X29,#var_B4]
__text:0000000100003BE0 08 00 00 90 08 11 3A 91       ADRL            X8, aFlagXxxxxxxxxx     ; "flag{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}"
__text:0000000100003BE8 A8 03 14 F8                   STUR            X8, [X29,#var_C0]
__text:0000000100003BEC A0 C3 02 D1                   SUB             X0, X29, #-__dst        ; __dst
__text:0000000100003BF0 01 00 00 90 21 40 3C 91       ADRL            X1, unk_100003F10       ; __src
__text:0000000100003BF8 02 13 80 D2                   MOV             X2, #0x98               ; __n
__text:0000000100003BFC 99 00 00 94                   BL              _memcpy

Upon observation, I identify its endianness (little-endian) and architecture (AArch64) according to the length of address. Following that, I extracted byte codes by simply selecting a block of text in vscode and pasting in Ghidra hex editor. Note that a memory section needs to be created and initialized before any paste.

Loading opcode and data to a newly created memory section in Ghidra

So, create a memory section with a big enough size and base address corresponding to that of text in the dump file. Image alt.

Paste opcodes in the hex editor. Beware that bytes in pure data segment are not completely displayed, e.g., flag{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}. You may need to paste them in a string manner. Image alt.

In order to avoid Ghidra crashing in case of pasting a bunch of data, I also wrote a script to create a memory block then write data to that block.

# Import necessary Ghidra modules
from ghidra.program.model.mem import MemoryBlock, MemoryAccessException
from ghidra.program.model.address import Address
from ghidra.util.task import TaskMonitor

# Function to create a memory block
def create_memory_block(block_name, start_address, size):
    # Get the current program
    program = getCurrentProgram()
    
    # Get the memory object
    memory = program.getMemory()
    
    # Convert start address to an Address object
    start_addr = toAddr(start_address)
    
    # Start a transaction to make changes to the program
    transaction = program.startTransaction("Create Memory Block")
    
    try:
        # Create the memory block
        # Zero-ing the newly created block
        block = memory.createInitializedBlock(block_name, start_addr, size, None, TaskMonitor.DUMMY, False)
        
        # Set the block as read/write/execute
        block.setRead(True)
        block.setWrite(True)
        block.setExecute(True)
        
        # Commit the transaction
        program.endTransaction(transaction, True)
        print(f"Memory block '{block_name}' created successfully.")
        
    except MemoryAccessException as e:
        # If an error occurs, rollback the transaction
        program.endTransaction(transaction, False)
        print(f"Error creating memory block: {e}")

def write_bytes(address, bytes_to_write):
    # Get current program
    current_program = getCurrentProgram()
    address = toAddr(address)
    
    # Start a transaction to make changes to the program
    transactionID = current_program.startTransaction("Write Memory")
    
    try:
        # Write bytes to memory
        current_program.getMemory().setBytes(address, bytes_to_write)
        
        # Commit the changes
        current_program.endTransaction(transactionID, True)
        print("Bytes written successfully.")
    except Exception as e:
        # If an error occurs, rollback the changes
        current_program.endTransaction(transactionID, False)
        print("Error:", e)


block_name = "asm_re"
start_address = 0x100003bbc
size = 0x500

# ignore the example bytearray, replace with actual data while invoking
create_initialized_memory_block(block_name, start_address, size)
write_bytes(start_address, bytearray(b'\x90\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'))

Analyzing the application logic

Let Ghidra translate these code and decompile. Then I got the following decompiled psudo C code.

undefined4 FUN_100003bbc(void)
{
  int aiStack_120 [2];
  ulong len_flag_+1*4;
  long new_flag;
  int index;
  int match;
  int decoded_char2;
  int cur_char2;
  int j;
  int decoded_char;
  int cur_char;
  int i;
  int *local_e0;
  int len_flag;
  char *flag;
  undefined4 local_c4;
  int flag_match_code [38];
  long local_28;
  
  local_28 = *(long *)PTR_DAT_100004010;
  local_c4 = 0;
  flag = s_flag{xxxxxxxxxxxxxxxxxxxxxxxxxxx_100003e84;
  _memcpy(flag_match_code,&DAT_100003f10,0x98);
  len_flag = _strlen(flag);
  len_flag_+1*4 = (ulong)(len_flag + 1) * 4 + 0xf & 0xfffffffffffffff0;
  local_e0 = aiStack_120;
  (*(code *)PTR____chkstk_darwin_ptr_100004000)();
  new_flag = (long)aiStack_120 - len_flag_+1*4;
  for (i = 0; i < len_flag; i = i + 1) {
    cur_char = (int)flag[i];
    encoded_char = (cur_char * 0x50 + 0x14U ^ 0x4d) + 0x1e;
    *(int *)(new_flag + (long)i * 4) = encoded_char;
  }
                    /* adding a terminator character
                        */
  *(undefined4 *)(new_flag + (long)len_flag * 4) = 0;
  for (j = 0; j < len_flag; j = j + 1) {
    cur_char2 = (int)flag[j];
    encoded_char2 = (cur_char2 * 0x50 + 0x14U ^ 0x4d) + 0x1e;
    *(int *)(new_flag + (long)j * 4) = encoded_char2;
  }
  _printf(&DAT_100003eab);
  match = 1;
  index = 0;
  do {
    if (len_flag <= index) {
LAB_100003dcc:
      if (match == 0) {
        _printf(s_The_result_array_does_not_match_t_100003edb);
      }
      else {
        _printf(s_The_result_array_matches_the_exp_100003ead);
      }
      local_c4 = 0;
      aiStack_120[1] = 0;
      if (*(long *)PTR_DAT_100004010 != local_28) {
        FUN_100003e54();
      }
      return aiStack_120[1];
    }
    if (*(int *)(new_flag + (long)index * 4) != flag_match_code[index]) {
      match = 0;
      goto LAB_100003dcc;
    }
    index = index + 1;
  } while( true );
}

It is clear that each character of a fake flag string is encoded as an integer and stored in a buffer. In turn, each integer in this buffer is incrementally compared to that of secret code at address 0x100003f10. Hence, the simplifie version of encoding looks like:

  for (i = 0; i < len_flag; i = i + 1) {
    cur_char = (int)potential_flag[i];
    encoded_char = (cur_char * 0x50 + 0x14U ^ 0x4d) + 0x1e;
    if (encoded_char == flag_match_code[index])
        continue
    exit(1);
  putc(potential_flag);
  }

Due to this binary not receving any input from user, challengers are ased to understand this logic and then write down their solver. Hence, it is easy and convenient to do so just on the Ghidra Python environment, offering me to directly access memory. In order to automate the process of decoding, I see the flag as user input, making use of the idea of fuzz testing and getting the matched flag string via bruteforce. Hence, I wrote a script to guess this string automatically. Moreover, this script reminds me to build an utility to read bytes from memory, which is useful in future cases.

import struct

def read_mem_bytes(address, size):
	byte_list = []
	for i in range(size):
		cur_byte = current_program.getMemory().getByte(toAddr(address+i))
		byte_list.append(int(struct.pack('b', cur_byte).hex(),16))
	return byte_list

current_program = getCurrentProgram()
flag = "flag{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}"
address = 0x0000000100003F10
byte_list = read_mem_bytes(address, 0x98)
print(byte_list)
print(len(byte_list)/4)
final_flag = []
for i in range(len(flag)):
	for cur_char in string.printable:
		cur_char = ord(cur_char)
		if int.from_bytes(byte_list[4*i:4*i+4], "little") == ((cur_char * 0x50 + 0x14 ^ 0x4d) + 0x1e):
			final_flag.append(chr(cur_char))
			break
print("".join(final_flag))

After running this script, the flag is printed out:

flag{67e9a228e45b622c2992fb5174a4f5f5}

Brainstorming more potential cases

Although this challenge does not drive people crazy in terms of encoding algorithm, I did see plenty of similar RE challenges that were built at the top of a very complex algorithm and ask people to provide correct input. However, the level of complexity of algorithm makes it difficult to figure out in a short period of time. Even though many people naturally link the solition to the use of fuzz testing, most of them are not able to acquire the correct input due to misconfiguration or incapcbility of fuzzers.

Nevertheless, there is a relatively feasible fuzzing approach named concolic fuzzing, that attempts to reduce input space by fixing some variable with certain values. Assuming that this encoding algorithm is fairly complex and the input only consists of characters, the strategy to configure fuzzers is to constraint fuzzing engines, only generaing printable input.