Competition on CTF time: Google CTF

I participated in the 2020 Google CTF competition and even though I only managed to solve only 2 of the problems, I found the competition very educational and want to share my solutions.

Basics

This was the easy hardware challenge, and it was fairly straightforward. All it involved was reading and understanding a SystemVerilog file, as well as an emulator written in C. That being said, I am a complete noob when it comes to hardware design languages, and I am quite lazy, so I guessed my way through most of the syntax.

The C code was rather straightforward: it emulated a clock, passing in one character of input from stdin to the machine every clock cycle, until it got either a null or an endline character.

The SystemVerilog file was had all the details, here are the contents:

module check(
    input clk,

    input [6:0] data,
    output wire open_safe
);

reg [6:0] memory [7:0];
reg [2:0] idx = 0;

wire [55:0] magic = {
    {memory[0], memory[5]},
    {memory[6], memory[2]},
    {memory[4], memory[3]},
    {memory[7], memory[1]}
};

wire [55:0] kittens = { magic[9:0],  magic[41:22], magic[21:10], magic[55:42] };
assign open_safe = kittens == 56'd3008192072309708;

always_ff @(posedge clk) begin
    memory[idx] <= data;
    idx <= idx + 5;
end

endmodule

After brief research about the syntax and informed guessing I quickly figured out the following:

  • Basic syntax:

    • The notation [b:a] specifies an inclusive range (on both sides) of bits from b to a. Since writing out a value in binary places the 0th bit last, Verilog has the notation with b > a, so that we have [leftmost bit:rightmost bit].

    • {} specifies concatenation

    • 56'd3008192072309708 specifies a 56 bit value equal to 3008192072309708

    • always_ff block updates the memory once per clock cycle

  • Variables:

    • clk denotes the current state of the clock

    • data denotes the 7-bit character provided by the user

    • memory is an array that stores the input (up to 8 characters)

    • idx represents the next position in memory to use for storage

    • magic and kittens are just a permutation of the memory bits

  • Every clock cycle data is written into memory at idx, and idx is incremented by 5.

Given this information, we simply need to start from the end (final kittens value) and work backwards.

// calculate kittens
assign open_safe = kittens == 56'haafef4be2dbcc; 
// 56'haafef4be2dbcc in binary is:
// 00001010101011111110111101001011111000101101101111001100 # kittens

// calculate magic
wire [55:0] kittens = { magic[9:0],  magic[41:22], magic[21:10], magic[55:42] };
// [    9-0 ] [         41-22    ] [    21-10 ] [  55-42     ] # magic indices
// 0000101010 10111111101111010010 111110001011 01101111001100 # kittens

// magic = 01101111001100 ++ 10111111101111010010 ++ 111110001011 ++ 0000101010

// calculate memory
wire [55:0] magic = {
    {memory[0], memory[5]},
    {memory[6], memory[2]},
    {memory[4], memory[3]},
    {memory[7], memory[1]}
};
//    0       5       6       2       4       3       7       1    # memory indeces
// 0110111 1001100 1011111 1101111 0100101 1111000 1011000 0101010 # magic

// memory = 1011000 1011111 1001100 0100101 1111000 1101111 0101010 0110111

// calculate user input
always_ff @(posedge clk) begin
    memory[idx] <= data;
    idx <= idx + 5;
end
// each time idx is incremented by 5, and starts at 0, so we input the memory elements in this order:
// 0 -> 5 -> 2 -> 7 -> 4 -> 1 -> 6 -> 3
// so the input should look like this:
// 0110111 1001100 1101111 1011000 0100101 0101010 1011111 1111000
// which corresponds to the following ascii:
// 7LoX%*_x

Now we simply nc to the server and input 7LoX%*_x for the flag

All in all, this was a nice introduction to hardware challenges (this was my first one) but a pretty mundane task otherwise, once the syntax is understood.

Beginner

This was the easy reversing challenge, but I found it quite interesting. The whole binary is rather short, and even shorter are the actually interesting parts:

; main disassembly
...
; input is read to the top of the stack up to here
; input is 15 characters long (+ null byte), and gets loaded into %xmm0
<+46>:    movdqa (%rsp),%xmm0
...
; the "algorithm"
<+62>:    pshufb 0x2fa9(%rip),%xmm0        # 0x4070 <SHUFFLE>
<+71>:    paddd  0x2f91(%rip),%xmm0        # 0x4060 <ADD32>
<+79>:    pxor   0x2f79(%rip),%xmm0        # 0x4050 <XOR>
; algorithm ends
; compare computed value with original input, and jump if these are not equal
<+87>:    movaps %xmm0,0x10(%rsp)
<+92>:    callq  0x1030 <strncmp@plt>
<+97>:    test   %eax,%eax
<+99>:    jne    0x1100 <main+128>
...

So we provide up to 15 characters of input, which is loaded into %xmm0. Then the bytes get shuffled, some value gets added, and another value gets xored to the result. By inputing abcdefghijklmno and examining %xmm0 after each instruction I easily found the permutation, as well as the values that get added and xored.

Now this computed value gets compared to our initial value, and it needs to be equal (otherwise code execution jumps to a place that we don’t like)

Clearly simply bruteforcing the password will not work, since there are way too many possible 15-character words. Here I made several observations:

  • Let’s call the correct input string s, and let m denote the permutation implemented by the shuffle operation (bytei gets mapped to byte m[i]). Then knowing some byte i of s, we can almost calculate the byte at position m[i]: s[m[i]] = (s[i] + number_to_add[m[i]] + c) % 256 ^ number_to_xor[m[i]] The only unknown in the equation above is the possible carry bit c from the addition of the previous bytes. But it can take only 2 values (0 or 1), so the branching factor is small.

Building the word like this one byte at a time, and branching to both possible options (with a carry and without) at every step gives us 2^15 possible candidate values. These can be easily bruteforced to check which ones actually preserve equality.

You can see my code below. I did not use any additional libraries, so the code should be understandable to pretty much anyone. The code runs in a couple seconds and results in a unique value which gives the flag when decoded from hex.

# byte at position i gets mapped byte at position m[i]
m = {0:15, 1:3, 2:0, 3:8, 4:10, 5:4, 6:1, 7:2, 8:11, 9:6, 10:12, 11:5, 12:13, 13:14, 14:7, 15:9}

# Initialize number to add, as well as individual bytes
add_num = 0x6763746613371336fee1deacdeadbeef
add_num_copy = add_num
add_bytes = []
while add_num_copy > 0:
    add_bytes.append(add_num_copy % 256)
    add_num_copy //= 256

# Initialize number to xor with, as well as individual bytes
xor_num = 0xaaf986eb34f823d4385f1a8d49b45876
xor_num_copy = xor_num
xor_bytes = []
while xor_num_copy > 0:
    xor_bytes.append(xr_num_copy % 256)
    xor_num_copy //= 256

# Candidates to consider in the final answer
candidates = [[0] * 16]

# Last calculated byte
cur = 15

# Calculate remaining 15 bytes
for _ in range(15):
    new_candidates = []
    # Update candidates to include byte m[i]
    for i in candidates:
        a = i.copy() # assume there is no carry from previous bit
        b = i.copy() # assume there is a carry from addition of previous byte
        a[m[cur]] = ((i[cur] + add_bytes[m[cur]]) % 256) ^ xor_bytes[m[cur]]
        b[m[cur]] = ((i[cur] + add_bytes[m[cur]] + 1) % 256) ^ xor_bytes[m[cur]]
        new_candidates.append(a)
        new_candidates.append(b)
    cur = m[cur]
    candidates = new_candidates
correct = set()

# find the candidates that actually satisfy the equation that we need
for i in candidates:
    # construct the initial number from bytes
    num = 0
    for j in range(15, -1, -1):
        num = (num * 256) + i[j]
    # Construct the permuted bytes
    transformed = [0]*16
    for j in range(16):
        transformed[m[j]] = i[j]
    # Construct the number from the permuted bytes
    num_transformed = 0
    for j in range(15, -1, -1):
        num_transformed = (num_transformed * 256) + transformed[j]
    # Check if the number satisfies our needs
    if num == ((num_transformed + add_num) % 2**128) ^ xor_num:
        correct.add(num)
print("correct numbers: ", [hex(i) for i in correct])
# correct numbers:  ['0x7d21334d723066444d31537b465443']

Note: looking at this solution after a few weeks I realize that I have made a mistake. The ADD32 instruction actually adds 4 32-bit numbers instead of a single 128-bit number, and that slightly changes the algorithm. To arrive at the correct answer, my original input abcdefghijklmno, and the answer to the problem had to give the same top carry bits when adding the 4 32-bit numbers to them. However, since both my original input and the answer are mostly made up of letters, they are quite close in value to each other, and are likely to give the same carries when adding many numbers. So even though I got a little bit lucky, I am not at all surprised that I did!

Overview

Obviously solving only 2 problems is not ideal, but this was my first proper attempt at a difficult CTF competition, and I wanted to give myself a point of reference of my ability. This was a great educational experience, and I hope to learn a lot from the other writeups. It will be interesting to see how much I improve over the next year, and I hope I can at least make it into the top 100. :)