# Google CTF 2023 Writeup

> This year's Google CTF, while traveling in Danang, I managed to solve 2 misc problems: **npc** and **symatrix**. Both seem like algorithm-flavored problems.

Here are the writeups for them:

# npc

## Problem description

> A friend handed me this map and told me that it will lead me to the flag. It is confusing me and I don't know how to read it, can you help me out?
> 
> [Attachment](https://storage.googleapis.com/gctf-2023-attachments-project/9a8f5d47fab0a460f9826c4f13aa1dff2809140e68325fb21edab674ee5ec2476b902d2797c41bd6d9311e3510c9366d739d9404e00aa9d4ffd6a0d88e5bf2ef.zip)

## Analysis

We got the following 4 files after extracting the problem archive:

* `encrypt.py`: Main code for encryption
    
* `secret.age`: Encrypted data
    
* `hint.dot`: Hints for finding the plaintext
    
* `USACONST.TXT`: Dictionary
    

Skim through the flow of `encrypt.py`, I found the interesting parts that describe:

* Encryption mechanism: Using `pyrage.passphrase` :
    
    ```python
    from pyrage import passphrase
    
    with open(filename, 'wb') as f:
        f.write(passphrase.encrypt(secret, password))
    ```
    
* Wordlist generation: Unique normalized words from `USACONST.TXT`
    
    ```python
    def get_word_list():
      with open('USACONST.TXT', encoding='ISO8859') as f:
        text = f.read()
      return list(set(re.sub('[^a-z]', ' ', text.lower()).split()))
    ```
    
* Password generation: Random words from word list:
    
    ```python
    def generate_password(num_words):
      word_list = get_word_list()
      return ''.join(secrets.choice(word_list) for _ in range(num_words))
    ```
    
* Hint generation for `hint.dot` file:
    
    ```python
    def generate_hint(password):
      random = secrets.SystemRandom()
      id_gen = IdGen()
      graph = Graph([],[])
    
      # Step 1: Append original graph edges
      for letter in password:
        graph.nodes.append(Node(letter, id_gen.generate_id()))
    
      # Edges that represent the relationship between 2 consecutive characters
      for a, b in zip(graph.nodes, graph.nodes[1:]):
        graph.edges.append(Edge(a, b))
    
      # Step 2: Add random (and maybe non-existence) edges to confuse us
      for _ in range(int(len(password)**1.3)):
        a, b = random.sample(graph.nodes, 2)
        graph.edges.append(Edge(a, b))
    
      # Step 3: Shuffle the edges, nodes and connections to further confuse us
      random.shuffle(graph.nodes)
      random.shuffle(graph.edges)
      for edge in graph.edges:
        if random.random() % 2:
          edge.a, edge.b = edge.b, edge.a
    
      return graph
    ```
    

From the above information, I can already imagine the encryption and hint generation process:

1. Generate word list
    
2. Generate a password by picking random words from the word list
    
3. Use the generated password to encrypt the flag and write to `secret.age`
    
4. Generate the graph containing the hint from the password and write to `hint.dot` file
    

So what is the problem that we need to solve given the following process?

To get the encryption password, we will need to solve process 4 and recover the password from the given obfuscated graph.

It's clear that if they didn't add the random and shuffling part of the code, then we would be dealing with the edges of a degenerated graph.

But after the obfuscation process, here is what we get from `hint.dot`

```plaintext
graph {
    comment="Nodes"
    1051081353 [label=a];
    66849241 [label=a];
    53342583 [label=n];
    213493562 [label=d];
    4385267 [label=i];
    261138725 [label=o];
    51574206 [label=t];
    565468867 [label=e];
    647082638 [label=r];
    177014844 [label=d];
    894978618 [label=e];
    948544779 [label=n];
    572570465 [label=n];
    582531406 [label=r];
    264939475 [label=a];
    415170621 [label=s];
    532012257 [label=t];
    151901859 [label=v];
    346347468 [label=g];
    148496047 [label=g];
    125615053 [label=s];
    723039811 [label=e];
    962878065 [label=i];
    112993293 [label=w];
    748275487 [label=n];
    120330115 [label=s];
    76544105 [label=c];
    186790608 [label=h];

    comment="Edges"
    53342583 -- 565468867;
    582531406 -- 76544105;
    125615053 -- 120330115;
    264939475 -- 572570465;
    53342583 -- 565468867;
    532012257 -- 264939475;
    346347468 -- 532012257;
    125615053 -- 582531406;
    177014844 -- 120330115;
    264939475 -- 962878065;
    comment="<lots more edges>"
}
```

You can paste the graph data into graphviz, but the result doesn't look fine:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763529206/a1885477-f721-4f92-8023-7b97313c645a.png align="center")

A thing to note is that if a character appears multiple times in the password, they are treated as different nodes. This will help us to build the password from the graph easier because we will not be confused by unnecessary edges.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763186168/9cf34d4a-49ee-44ac-b66a-ec71c64ad2fe.png align="center")

For example, a graph that represents the word "abac" using the method above will be like this:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763360035/33fc958d-f23e-4d3c-a74a-915d8639518c.png align="center")

If they don't treat the characters as unique nodes, it will create unnecessary loops like this:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763362446/5f31b108-1227-4579-85dc-cc271f85cf16.png align="center")

Also because we treated the characters as unique nodes, we knew that the password has **28 characters.**

Following the flow of `generate_hint` function above, if your password is "googlectf", the graph definition would look like this (this is just a demonstration, realistically in the code, the label ids will be randomized so there is no straight way to recover the password):

```plaintext
graph {
    comment="Vertices"
    1 [label=g];
    2 [label=o];
    3 [label=o];
    4 [label=g];
    5 [label=l];
    6 [label=e];
    7 [label=c];
    8 [label=t];
    9 [label=f];
    
    comment="Edges"
    1 -- 2
    2 -- 3
    3 -- 4
    4 -- 5
    5 -- 6
    6 -- 7
    7 -- 8
    8 -- 9
}
```

Which visualized to the following:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763930538/b9acad63-9283-4c43-85cf-e031e4c72ec1.png align="center")

But after the obfuscation process, it would look kind of this:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687763948459/b3cdfbb1-eb6e-4cb4-b45d-1a6c608847fc.png align="center")

And hence after the process, we will lose every of these informations:

* The starting node
    
* The ordering of nodes (if 2 nodes are connected then which comes first?)
    
* Which are the real edges?
    

And the clues that are left:

* This graph contains the connections of all the consecutive characters of the password
    
* The password has 28 characters
    
* The password must contain words from the word list
    

## And so, the journey begins

### The starting node

I simply try each of the 28 nodes as the starting node

### The ordering of nodes

One thing we know is that the text representation of the correct node ordering must exist inside the word list.

Hence we can traverse the graph from the chosen starting node, try each of the ordering possible and check if it matches the word list.

There might be multiple ordering that matches the conditions, so we need to store all of them in an array and then check to see which one correctly decrypt the flag.

### Which are the real edges?

Same to the problem of node ordering, the real edges' text representation must exist inside the word list.

Hence we can filter this while checking the ordering.

We also need to filter fake edges that appear in the word list.

### Minor word-splitting problem

Since words in the word list were joined without space to form the password, we also need to deal with the problem that one password candidate can be generated from different combinations of words inside the word list.

For example "forever" can be combined from "for" + "ever" or "forever" itself. This case is trivial, but another case that can cause a problem is "generous". It cannot be combined if we choose "gene" + something, but can be combined using the word "generous" itself.

To solve this word-matching problem, I chose the Trie data structure. It provides the ability to conveniently check whether a word exists inside the dictionary. For this particular problem, I optimized it a bit to check where can we jump from the current node of Trie instead of having to repeatedly check the whole word again and again.

Here is my Trie code, generated by ChatGPT

```python
class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                current.children[char] = TrieNode()
            current = current.children[char]
        current.is_end_of_word = True

    def has_next_node(self, current_node, char):
        return current_node.children[char] if char in current_node.children else False
```

## Let's implement

After getting every puzzle piece in hand, I built my `tryToSolve` function which is simply a backtracking function to try each possible ordering of nodes and picking out the candidates:

```python
marked = {}

def tryToSolve(u, trieNode = trie.root, len = 28):
    nodeid, char = nodes[u]
    nextNode = trie.has_next_node(trieNode, char)
    
    res = []

    if not nextNode:
        if trieNode.is_end_of_word:
            return tryToSolve(u, trie.root, len)
        return []

    len -= 1

    marked[nodeid] = True

    # print(u, char)

    if len == 0:
        return [char]

    for v in adj[nodeid]:
        if not v in marked:
            possiblities = tryToSolve(v, nextNode, len)
            for possibility in possiblities:
                res.append(char + possibility)

    del marked[nodeid]

    return res
```

And of course, I ran the function with each node as the starting one, as described earlier:

```python
for startingNode in nodes.values():
    marked = {}
    for word in tryToSolve(startingNode[0], trie.root, 28):
        print(word)
```

After execution, I got 6 possible candidates:

```bash
┌──(kali㉿kali)-[~/Desktop/npc]
└─$ python3 solve.py                                                                                              130 ⨯
dwatersigngivenchosenstandar
givenstandardsignwaterchosen
signgivenstandardwaterchosen
waterchosenstandardsigngiven
standardwatersigngivenchosen
chosenstandardwatersigngiven
```

And the rest is easy. I just need to try each one for decryption.

```python
from pyrage import passphrase

encrypted = open('secret.age', 'rb').read()

candidates = open('candidates.txt').readlines()
for pw in candidates:
    try:
        pw = pw.strip()
        print(passphrase.decrypt(encrypted, pw))
    except Exception as e:
        print(e)
```

```python
┌──(kali㉿kali)-[~/Desktop/npc]
└─$ python3 solve1.py
Decryption failed
Decryption failed
Decryption failed
Decryption failed
b'CTF{S3vEn_bR1dg35_0f_K0eN1g5BeRg}'
Decryption failed
```

## Flag

> CTF{S3vEn\_bR1dg35\_0f\_K0eN1g5BeRg}

Darn it, we were still the second team to solve it :(

![May be an image of text that says 'vh++ Logout Team ANNOUNCEMENTS 664.83 T-34h 24/06/ 2023 MINE THE GAP NPC 474pt Opportunities at G oogle misc @otona crypto pwn 米 reversing web sandbox TOTALLY NOT 474pt handed this map lag lead confusir told me hat don will out? know now to ead it, Attachment NPC Solved WreckZeroBytes vh++ ANNOUNCEMENTS Submit task feedback challenges!! 13:00 released more challenges! Enjoy! Biohazard added missing file 07:10 Jun 24 2023 FLAG CAPTURED'](https://scontent.fhan14-3.fna.fbcdn.net/v/t39.30808-6/355496777_3381527662175995_8663953152696437135_n.jpg?_nc_cat=103&cb=99be929b-3346023f&ccb=1-7&_nc_sid=730e14&_nc_ohc=6dFC1UNatNIAX9a1SYT&_nc_ht=scontent.fhan14-3.fna&oh=00_AfCBsPSEhHJhyihp3IV5EO4rAYtCX1XaLfHysP3N9ffCpw&oe=649D5F7B align="left")

# symatrix

## Problem description:

> The CIA has been tracking a group of hackers who communicate using PNG files embedded with a custom steganography algorithm. An insider spy was able to obtain the encoder, but it is not the original code. You have been tasked with reversing the encoder file and creating a decoder as soon as possible in order to read the most recent PNG file they have sent.
> 
> [Attachment](https://storage.googleapis.com/gctf-2023-attachments-project/aba60aa2e9c806187f88279742c2ced243dd73b142c5c5bac1327956975e4d3add04afad77cfd823dd4f00847f6334b294ab058308639f8cb52897e8f1be769e.zip)

## Analysis

We got the following 4 files after extracting the problem archive:

* `encoder.c`: Main code for encoding
    
* `symatrix.png`: Encoded data
    

Upon inspection of `encoder.c`, I realized that it's a C file compiled from Python code:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687767149680/390e18c4-f750-427b-acdf-33d24c439e7d.png align="center")

I tried to compile it using gcc but it threw lots of errors:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687767243597/48e18883-0eac-40a7-8afd-f4c49288a0c4.png align="center")

When inspecting further, I found out that there are blocks of original python code left in the compiled source. By further searching, I found 70 of those blocks.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687767333269/838cce53-3c75-4a94-b242-8a1e27bfdf12.png align="center")

I managed to extract those Python code blocks, but they still look like about 500 lines of a mess:

```python
/* "encoder.py":5
 * import binascii
 * 
 * def hexstr_to_binstr(hexstr):             # <<<<<<<<<<<<<<
 *     n = int(hexstr, 16)
 *     bstr = ''
 */
/* "encoder.py":6
 * 
 * def hexstr_to_binstr(hexstr):
 *     n = int(hexstr, 16)             # <<<<<<<<<<<<<<
 *     bstr = ''
 *     while n > 0:
 */
/* "encoder.py":7
 * def hexstr_to_binstr(hexstr):
 *     n = int(hexstr, 16)
 *     bstr = ''             # <<<<<<<<<<<<<<
 *     while n > 0:
 *         bstr = str(n % 2) + bstr
 */
...
```

And so I threw them into ChatGPT to see what it can do:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687768002125/5c26a710-9041-46de-84bb-60719cc0ce83.png align="center")

And out of my astonishment, I received the beautifully formatted,

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687768031110/a81116ad-f5f2-48a6-9ce7-7e7ee739455d.png align="center")

## So, what do we need to see?

## The encoder's source code

Now we got the encoder source code:

```python
from PIL import Image
from random import randint
import binascii

def hexstr_to_binstr(hexstr):
    n = int(hexstr, 16)
    bstr = ''
    while n > 0:
        bstr = str(n % 2) + bstr
        n = n >> 1
    if len(bstr) % 8 != 0:
        bstr = '0' + bstr
    return bstr

def pixel_bit(b):
    return tuple((0, 1, b))

def embed(t1, t2):
    return tuple((t1[0] + t2[0], t1[1] + t2[1], t1[2] + t2[2]))

def full_pixel(pixel):
    return pixel[1] == 255 or pixel[2] == 255

print("Embedding file...")

bin_data = open("./flag.txt", 'rb').read()
data_to_hide = binascii.hexlify(bin_data).decode('utf-8')

base_image = Image.open("./original.png")
x_len, y_len = base_image.size

nx_len = x_len * 2

new_image = Image.new("RGB", (nx_len, y_len))
base_matrix = base_image.load()
new_matrix = new_image.load()

binary_string = hexstr_to_binstr(data_to_hide)
remaining_bits = len(binary_string)

nx_len = nx_len - 1
next_position = 0

for i in range(0, y_len):
    for j in range(0, x_len):
        pixel = new_matrix[j, i] = base_matrix[j, i]
        if remaining_bits > 0 and next_position <= 0 and not full_pixel(pixel):
            new_matrix[nx_len - j, i] = embed(pixel_bit(int(binary_string[0])), pixel)
            next_position = randint(1, 17)
            binary_string = binary_string[1:]
            remaining_bits -= 1
        else:
            next_position -= 1

new_image.save("./symatrix.png")
new_image.close()
base_image.close()

print("Work done!")
exit(1)
```

### The encoding process

1. The encoder doubled the image width:
    
    ```python
    nx_len = x_len * 2
    new_image = Image.new("RGB", (nx_len, y_len))
    ```
    
2. It hides one bit of data. The original `(r, g, b)` code turned into `(r, g + 1, b + encrypted_bit)`. The encrypted pixel lies in the doubled half of the image (`[j, i]` -&gt; `[nx_len - j, i]`)
    
    ```python
    def pixel_bit(b):
        # Encoding formla
        return tuple((0, 1, b))
    
    def embed(t1, t2):
        return tuple((t1[0] + t2[0], t1[1] + t2[1], t1[2] + t2[2]))
    
    def full_pixel(pixel):
        return pixel[1] == 255 or pixel[2] == 255
    
    if remaining_bits > 0 and next_position <= 0 and not full_pixel(pixel):
        new_matrix[nx_len - j, i] = embed(pixel_bit(int(binary_string[0])), pixel)
    ```
    
3. It chooses another random position for hiding the next bit:
    
    ```python
    next_position = randint(1, 17)
    ```
    

## Solution

So now we just need to write a simple script to find the encoded bits and get the flag

```python
from PIL import Image

def decode_image(encoded_image_path):
    encoded_image = Image.open(encoded_image_path)
    encoded_matrix = encoded_image.load()

    decoded_data = ""

    x_len, y_len = encoded_image.size
    nx_len = x_len // 2

    for i in range(y_len):
        for j in range(nx_len):
            [r1, g1, b1] = encoded_matrix[j, i]
            [r2, g2, b2] = encoded_matrix[x_len - j - 1, i]

            if r1 == r2 and g1 + 1 == g2 and b2 - b1 <= 1:
                # print((r1, g1, b1))
                # print((r2, g2, b2))
                decoded_data += str(b2 - b1)

    decoded_bytes = bytearray()

    for i in range(0, len(decoded_data), 8):
        byte_str = decoded_data[i:i+8]
        decoded_bytes.append(int(byte_str, 2))

    return decoded_bytes
    
print(decode_image('./symatrix.png'))
```

## Flag

> CTF{W4ke\_Up\_Ne0+Th1s\_I5\_Th3\_Fl4g!}

# GLHF!

Thank you `vh++` for hosting a team for Vietnamese

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1687769093469/34b9595a-e7be-4456-9ad0-bbc192b9859a.png align="center")
