KoreLogic Blog

How I Solved (Most Of) the Yara CTF Puzzles: Puzzle #1 – #4

2015-08-17 08:00

During Black Hat, Ron Tokazowski of phishme.com put together a Yara Capture The Flag (CTF) contest for Black Hat 2015. This CTF consisted of 11 logic and Yara-based puzzles that participants had to solve for a chance to win a DJI Quadcopter. The best part is you could participate in the CTF if you weren't at Black Hat!

I participated in the CTF and won!!! I got through 10 out of 11 puzzles; the 11th and my lack of doing it is explained later. This post, as well as two more, describe how I went through each puzzle and solved them. The puzzles are still accessible at the CTF page, so be warned that spoilers are below!

Capture the Flag contests are an important resource for anyone in information security. When performed correctly, they help to increase your skills and expand your methods and techniques for solving problems. There has never been a CTF that I haven't learned something, and because of this I try to do as many of them as my schedule allows.

In each puzzle I present in these posts, I'll go through my thought process for how I solved it. Understand that there are probably better ways to solve these, but when you are in a timed contest you go where your mind takes you, which is sometimes down incorrect or the least efficient paths.

As stated, the Yara CTF consisted of 11 challenges that had to be solved. The CTF also came with an email template that listed what needed to be provided for each puzzle solution.

Puzzle #1

The first puzzle was in a 1.2MB file named "all about that base" and the goal was to find a key contained within. The contents of the file were an alphanumeric pattern in one line that ended with the following:

...WFZtMUtSbE5zV2xWV1ZrWXpWVVpGT1ZCUlBUMD0=

Anyone doing CTF challenges should know that when dealing with encoded data there are a few things you should always look for. The first is Base64 encoded data. Data encoded with Base64 consists of upper- and lower-case letters, numbers, plus, and the forward-slash. More importantly, it will often end in a single or double equal sign, as the string above.

Base64 decoding the string returned another base64 encoded string. Decoding that gave another base64 encoded string, which gave another base64 encoded string, and so on. The goal was to get to the bottom of all the base64 encoded strings - which I could either do manually, or write a small shell script. I went with the script.

#!/bin/bash

cat "all about that base" | base64 -d - > _tmp
while [ $? -eq 0 ]; do
  mv _tmp still_decode
  cat still_decode | base64 -d - > _tmp
done
mv still_decode decoded.txt

This script runs the command "base64 -d" in a loop to decode the data until it fails. Once this occurs, we know we are done decoding the data and hopefully have the decoded version.

The final file that was created was 1,657 bytes long and contained a nice ASCII art of a unicorn and the answer. I won't spoil this answer so you can try it on your own.

Puzzle #2

The second puzzle contained an encrypted RAR archive and a readme file. According to the readme, the goal was to identify the import hash, PE timestamp, and PE machine for the file in the RAR archive and create a Yara rule for it.

Windows executables are organized in a specific format known as the Portable Executable (PE) format. This format gives the operating system information about the executable, including where to start execution in the program, and what libraries and APIs should be loaded. Two of the fields the puzzle asks for, the PE timestamp and PE machine, are located in the PE header.

The PE timestamp, also called the TimeDateStamp, usually contains the time when the executable was linked during the compilation process. This field is often used by analysts to determine how old the executable is. However, be warned! This value can easily be changed by attackers.

The PE machine is a field that specifies the CPU type the executable can run on. For example, if its a 32-bit executable, it will likely have the value 0x14C (IMAGE_FILE_MACHINE_I386).

This information can be found with any number of PE header analysis tools. I used pecheck.py, a script written by Didier Stevens that uses the pefile Python library to dump all of the PE header information. Using this I was able to quickly find what the PE timestamp and machine values were:

$ pecheck.py my_file.exe 
PE check for 'my_file.exe':
Entropy: 6.913703 (Min=0.0, Max=8.0)
MD5     hash: 75c0cd3b15b1b67de14f4e97eafa3679
SHA-1   hash: 157ad48ab9f1bf257627272d16e83fe748d16985
SHA-256 hash: 659c865cfc57226fafd40a97f4fc21a0e5b828ab6f9bdcb3ca0de175b654a68b
...
[IMAGE_FILE_HEADER]
0xEC       0x0   Machine:                       0x8664    
0xEE       0x2   NumberOfSections:              0x3       
0xF0       0x4   TimeDateStamp:                 0x4F304133 [Mon Feb  6 21:08:03 2012 UTC]
...

The timestamp has a value of 0x4F304133 and the machine type is 0x8664 (IMAGE_FILE_MACHINE_AMD64).

The import hash is a hash that is created by examining the Import Address Table of an executable, which describes the DLLs and APIs that the executable want to load. Since many executables place this information in a unique order, this hash can be used to identify and track related malware samples or attackers. To generate the import hash, I used Florian Roth's ImpHash-Generator script.

$ python imphash-gen.py -p my_file.exe
###############################################################################
 
  IMPHASH Generator
  by Florian Roth
  January 2014
  Version 0.6.1
 
###############################################################################
Reading DB: 37694 imphashes found
IMP: bb916724e1b87e3af628b2f59174d064 MD5: 75c0cd3b15b1b67de14f4e97eafa3679 FILE: my_file.exe

Now that I had the data needed, the Yara rule had to be created. Fortunately, the latest versions of Yara come with a PE module that will allow me to directly obtain these values, as well as a function that generates the import hash! The resulting Yara signature is shown below:

import "pe"

rule PM_Yara_CTF_2015_2
{
	meta:
		author = "thudak@korelogic.com"
		comment = "Solution 2"
	condition:
		pe.machine == 0x8664 and 
		pe.timestamp == 0x4F304133 and 
		pe.imphash() == "bb916724e1b87e3af628b2f59174d064"
}

Two things to note. Initially when I created this rule I was using Yara 3.3.0. For some reason, pe.imphash() would not run correctly. However, after upgrading to Yara 3.4.0 (the latest version at the time), things worked fine.

Also, the rule is checking for the actual value of pe.machine. The Yara PE module has a number of definitions available to make the rules more readable. Therefore, the rule could also have been written "pe.machine == pe.MACHINE_AMD64".

Puzzle #3

The third puzzle was a file named "take off every zig" whose contents were an encoded string that contained a key:

Bar bs gurz unf gb or rnfl gb znxr lbh xrrc tbvat.
Lbhe nafjre vf: ZneznynqrFrzncuberErpvqvivfgVyyvgrengrXhzdhngTbbsonyy

Because there were spaces in the string, I knew it was not likely to be base64 encoded. Also, the spaces told me it was also probably not XOR encoded, another common encoding method we'll talk about later.

Why did the spaces tell me this? Both of these encoding techniques would have encoded the entire string, including spaces. It was quite possible that the attacker was being tricky and had written a custom algorithm to skip spaces and I kept that in the back of my mind. In the mean time, I decided to go with my gut and try something else: ROT13.

ROT13 is a Caesar cipher, or a substitution cipher in which all letters are rotated by 13 letters (or half the English alphabet). To test this out, I used the website www.rot13.com, pasted the string and pressed the decode button...

...and Voila! It worked! The result gave me the key I needed for the answer.

Puzzle #4

Puzzle #4 contained three phishing emails, and their mail headers, in which we were supposed to create one Yara rule to detect them all. To do this, I examined each email looking for items that were common to each other but unique to these emails. I came up with three things: the subject line, the attachment filename, and the message body.

There were other locations I could have used, such as the sender or the source MTA. However, these didn't have any common attributes between the three emails; the other items did.

The subject lines of all three emails were as follows:

1.eml:Subject: Resume
2.eml:Subject: resume
3.eml:Subject: =?utf-8?Q?Re=3AMy_resume?=

The first two are just the word "resume", in different case, while the last is the UTF-8 encoding of "My_resume". The common word between all three is "resume" and thus I had my first string to search for. I decided to use a regular expression to search for "Subject: ", followed by any number of characters, and then the word resume where the 'r' could be upper or lower case.

$subject = /Subject: [\S\s]+[rR]esume/

Next was the attachment file name. The three emails named their files as follows:

1.eml:Content-Disposition: attachment; filename="my_resume.zip"; size=462;
2.eml:Content-Disposition: attachment; filename="my_resume.zip"; size=460;
3.eml:Content-Disposition: attachment; filename="=?utf-8?B?bXlfcmVzdW1lLnppcA==?="

The first two emails have the name just as "my_resume.zip". The last is also named my_resume.zip, but the filename is base64 encoded. Since Yara does not have any base64 encoding or decoding functions, I would have to create two strings to search for both versions of the filename.

$filename = "my_resume.zip"
$file_b64 = "bXlfcmVzdW1lLnppcA=="

Finally, the message bodies of the emails. The first two email bodies were base64 encoded while the last was in plaintext. The plaintext version is below.

Hello my name is Ariana
 attach is my resume
I would appreciate your immediate attention to this matter

Sincerely
Ariana

Of course, all three were just slightly different. In each email, the name and the third lines were completely different. Additionally, in one email the second line stated "attached" instead of attach. Finally, the first word was "Hello" in two of the emails and "Hi" in the final email.

Looking for a common string between all three emails, I found that my best bet would be to search for "my name is". This would also mean I would have to search for the plaintext and base64 encoded versions of the string. The plaintext was easy, but the base64 was a little more difficult due to the position of the first word. However, I got lucky and found that part of the base64 encoded version of "my name i", or "bXkgbmFtZSBpc" was present in both base64 encoded versions of the email bodies. This allowed me to create the final strings for the Yara rule.

$hello = "my name is"
$hello_b64 = "bXkgbmFtZSBpc"

The final Yara rule looked as follows:

rule PM_Yara_CTF_2015_4
{
	meta:
		author = "thudak@korelogic.com"
		comment = "Solution 4"
	strings:
		$subject = /Subject: [\S\s]+[rR]esume/
		$filename = "my_resume.zip"
		$file_b64 = "bXlfcmVzdW1lLnppcA=="
		$hello = "my name is"
		$hello_b64 = "bXkgbmFtZSBpc"
	condition:
		$subject and ($filename or $file_b64) and ($hello or $hello_b64)

}

More to come!

This is long enough for one post. In the next post, I'll reveal how I solved puzzles 5-8!

Posted by Tyler at: 08:00 permalink