Competition on CTF time: Google CTF
I participated in the 2020 Google CTF competition and even though I only managed to solve only 2 of the problems, I found the competition very educational and want to share my solutions.
Basics
This was the easy hardware challenge, and it was fairly straightforward. All it involved was reading and understanding a SystemVerilog file, as well as an emulator written in C. That being said, I am a complete noob when it comes to hardware design languages, and I am quite lazy, so I guessed my way through most of the syntax.
The C code was rather straightforward: it emulated a clock, passing in one character of input from stdin to the machine every clock cycle, until it got either a null or an endline character.
The SystemVerilog file was had all the details, here are the contents:
module check(
input clk,
input [6:0] data,
output wire open_safe
);
reg [6:0] memory [7:0];
reg [2:0] idx = 0;
wire [55:0] magic = {
{memory[0], memory[5]},
{memory[6], memory[2]},
{memory[4], memory[3]},
{memory[7], memory[1]}
};
wire [55:0] kittens = { magic[9:0], magic[41:22], magic[21:10], magic[55:42] };
assign open_safe = kittens == 56'd3008192072309708;
always_ff @(posedge clk) begin
memory[idx] <= data;
idx <= idx + 5;
end
endmodule
After brief research about the syntax and informed guessing I quickly figured out the following:

Basic syntax:

The notation
[b:a]
specifies an inclusive range (on both sides) of bits from b to a. Since writing out a value in binary places the 0th bit last, Verilog has the notation withb > a
, so that we have[leftmost bit:rightmost bit]
. 
{}
specifies concatenation 
56'd3008192072309708
specifies a56
bit value equal to3008192072309708

always_ff
block updates the memory once per clock cycle


Variables:

clk
denotes the current state of the clock 
data
denotes the 7bit character provided by the user 
memory
is an array that stores the input (up to 8 characters) 
idx
represents the next position in memory to use for storage 
magic
andkittens
are just a permutation of thememory
bits


Every clock cycle
data
is written intomemory
atidx
, andidx
is incremented by 5.
Given this information, we simply need to start from the end (final kittens value) and work backwards.
// calculate kittens
assign open_safe = kittens == 56'haafef4be2dbcc;
// 56'haafef4be2dbcc in binary is:
// 00001010101011111110111101001011111000101101101111001100 # kittens
// calculate magic
wire [55:0] kittens = { magic[9:0], magic[41:22], magic[21:10], magic[55:42] };
// [ 90 ] [ 4122 ] [ 2110 ] [ 5542 ] # magic indices
// 0000101010 10111111101111010010 111110001011 01101111001100 # kittens
// magic = 01101111001100 ++ 10111111101111010010 ++ 111110001011 ++ 0000101010
// calculate memory
wire [55:0] magic = {
{memory[0], memory[5]},
{memory[6], memory[2]},
{memory[4], memory[3]},
{memory[7], memory[1]}
};
// 0 5 6 2 4 3 7 1 # memory indeces
// 0110111 1001100 1011111 1101111 0100101 1111000 1011000 0101010 # magic
// memory = 1011000 1011111 1001100 0100101 1111000 1101111 0101010 0110111
// calculate user input
always_ff @(posedge clk) begin
memory[idx] <= data;
idx <= idx + 5;
end
// each time idx is incremented by 5, and starts at 0, so we input the memory elements in this order:
// 0 > 5 > 2 > 7 > 4 > 1 > 6 > 3
// so the input should look like this:
// 0110111 1001100 1101111 1011000 0100101 0101010 1011111 1111000
// which corresponds to the following ascii:
// 7LoX%*_x
Now we simply nc
to the server and input 7LoX%*_x
for the flag
All in all, this was a nice introduction to hardware challenges (this was my first one) but a pretty mundane task otherwise, once the syntax is understood.
Beginner
This was the easy reversing challenge, but I found it quite interesting. The whole binary is rather short, and even shorter are the actually interesting parts:
; main disassembly
...
; input is read to the top of the stack up to here
; input is 15 characters long (+ null byte), and gets loaded into %xmm0
<+46>: movdqa (%rsp),%xmm0
...
; the "algorithm"
<+62>: pshufb 0x2fa9(%rip),%xmm0 # 0x4070 <SHUFFLE>
<+71>: paddd 0x2f91(%rip),%xmm0 # 0x4060 <ADD32>
<+79>: pxor 0x2f79(%rip),%xmm0 # 0x4050 <XOR>
; algorithm ends
; compare computed value with original input, and jump if these are not equal
<+87>: movaps %xmm0,0x10(%rsp)
<+92>: callq 0x1030 <strncmp@plt>
<+97>: test %eax,%eax
<+99>: jne 0x1100 <main+128>
...
So we provide up to 15 characters of input, which is loaded into %xmm0
. Then the bytes get shuffled, some value gets added, and another value gets xored to the result. By inputing abcdefghijklmno
and examining %xmm0
after each instruction I easily found the permutation, as well as the values that get added and xored.
Now this computed value gets compared to our initial value, and it needs to be equal (otherwise code execution jumps to a place that we don’t like)
Clearly simply bruteforcing the password will not work, since there are way too many possible 15character words. Here I made several observations:
 Let’s call the correct input string
s
, and letm
denote the permutation implemented by the shuffle operation (bytei
gets mapped to bytem[i]
). Then knowing some bytei
ofs
, we can almost calculate the byte at positionm[i]
:s[m[i]] = (s[i] + number_to_add[m[i]] + c) % 256 ^ number_to_xor[m[i]]
The only unknown in the equation above is the possible carry bitc
from the addition of the previous bytes. But it can take only 2 values (0 or 1), so the branching factor is small.
Building the word like this one byte at a time, and branching to both possible options (with a carry and without) at every step gives us 2^15
possible candidate values. These can be easily bruteforced to check which ones actually preserve equality.
You can see my code below. I did not use any additional libraries, so the code should be understandable to pretty much anyone. The code runs in a couple seconds and results in a unique value which gives the flag when decoded from hex.
# byte at position i gets mapped byte at position m[i]
m = {0:15, 1:3, 2:0, 3:8, 4:10, 5:4, 6:1, 7:2, 8:11, 9:6, 10:12, 11:5, 12:13, 13:14, 14:7, 15:9}
# Initialize number to add, as well as individual bytes
add_num = 0x6763746613371336fee1deacdeadbeef
add_num_copy = add_num
add_bytes = []
while add_num_copy > 0:
add_bytes.append(add_num_copy % 256)
add_num_copy //= 256
# Initialize number to xor with, as well as individual bytes
xor_num = 0xaaf986eb34f823d4385f1a8d49b45876
xor_num_copy = xor_num
xor_bytes = []
while xor_num_copy > 0:
xor_bytes.append(xr_num_copy % 256)
xor_num_copy //= 256
# Candidates to consider in the final answer
candidates = [[0] * 16]
# Last calculated byte
cur = 15
# Calculate remaining 15 bytes
for _ in range(15):
new_candidates = []
# Update candidates to include byte m[i]
for i in candidates:
a = i.copy() # assume there is no carry from previous bit
b = i.copy() # assume there is a carry from addition of previous byte
a[m[cur]] = ((i[cur] + add_bytes[m[cur]]) % 256) ^ xor_bytes[m[cur]]
b[m[cur]] = ((i[cur] + add_bytes[m[cur]] + 1) % 256) ^ xor_bytes[m[cur]]
new_candidates.append(a)
new_candidates.append(b)
cur = m[cur]
candidates = new_candidates
correct = set()
# find the candidates that actually satisfy the equation that we need
for i in candidates:
# construct the initial number from bytes
num = 0
for j in range(15, 1, 1):
num = (num * 256) + i[j]
# Construct the permuted bytes
transformed = [0]*16
for j in range(16):
transformed[m[j]] = i[j]
# Construct the number from the permuted bytes
num_transformed = 0
for j in range(15, 1, 1):
num_transformed = (num_transformed * 256) + transformed[j]
# Check if the number satisfies our needs
if num == ((num_transformed + add_num) % 2**128) ^ xor_num:
correct.add(num)
print("correct numbers: ", [hex(i) for i in correct])
# correct numbers: ['0x7d21334d723066444d31537b465443']
Note: looking at this solution after a few weeks I realize that I have made a mistake. The ADD32 instruction actually adds 4 32bit numbers instead of a single 128bit number, and that slightly changes the algorithm. To arrive at the correct answer, my original input abcdefghijklmno
, and the answer to the problem had to give the same top carry bits when adding the 4 32bit numbers to them. However, since both my original input and the answer are mostly made up of letters, they are quite close in value to each other, and are likely to give the same carries when adding many numbers. So even though I got a little bit lucky, I am not at all surprised that I did!
Overview
Obviously solving only 2 problems is not ideal, but this was my first proper attempt at a difficult CTF competition, and I wanted to give myself a point of reference of my ability. This was a great educational experience, and I hope to learn a lot from the other writeups. It will be interesting to see how much I improve over the next year, and I hope I can at least make it into the top 100. :)