To that end, let’s talk about how to leverage Hashcat when competing in these competitions. Full disclaimer: I’m going to go full spoiler in how I’m approaching my cracking. At this point, I’ve been running cracking sessions way longer than the competition would have lasted if I had competed. Also, I’ve been on the various Discord and Twitter conversations about the contest this year and know how the hashes were generated. Heck, KoreLogic even posted themselves how they created the challenges [Full Spoiler Link]. So I’m not going to even pretend that this post represents how I would have done. Instead I want to focus on “given what we know, how can someone use Hashcat to crack those hashes”.
One issue that pops up a lot for me when using both John the Ripper and Hashcat to crack hashes, is that while their file formats are *mostly* the same, they are not directly compatible. This goes for how these tools expect hashes to be formatted when loading them up, and their .pot file formats they save their cracked passwords to.
Side note, while you can have usernames in your hash lists, Hashcat won’t load the hashes unless you include the “–username” flag on the command line telling Hashcat to strip/ignore those usernames. E.g.:
The commands to run Hashcat are very different than those to run John the Ripper. There’s pros and cons to both methods. File autocomplete works much better with Hashcat’s command line and Hashcat does directory inclusion (such as use all wordlists in a directory) better. But John the Ripper’s is less position dependent, has a ton of super powerful features for different attack modes on the command line, and quite honestly I’m just used to it more.
One cool feature of Hashcat is that you can specify a directory instead of a wordlist though. So you can use the following command to run a quick set of mangling rules against all of your dictionaries:
Given the limited success of these attacks (a few raw-MD5 cracks aren’t going to give a lot of points). There’s really three paths that I can take.
But what if your input dictionaries are the problem? That’s where brute-forcing small key lengths can be helpful using masks.
I’ll admit, I started to go into a long, long diversion about the mechanics behind Hashcat’s Masks and Markov optimizations. I really hate calling what Hashcat does a Markov attack and there’s a ton of optimizations that Hashcat developers can make to it. But that’s totally besides the point if you are trying to crack passwords RIGHT NOW. So I’ll save that side tangent for a different post and instead focus on cracking these contest hashes.
Masks are one area where having more computational power makes a huge difference. They let serious cracking rigs just chew through keyspace without requiring much skill or ability from their operators. Contest organizers know this and tend to create passwords that are resistant to un-optimized mask attacks. This means going through the entire key-space for 5/6/7/8 passwords is unlikely to be very successful.
- hashcat -a 3 -m 0 -1 ?l?u -2 cmiyc_sales_end.hcchr uncracked_hashes.txt 2023?1?l?l?l?l?l?l?2
There’s a lot going on in the above command. Let’s break this command down by parts:
- hashcat -a 3 -m 0
- The standard hashcat command targeting raw-md5 hashes (-m 0), and using mask mode (-a 3)
- -1 ?l?u
- I’m setting a custom mask character set here that includes two built in character sets [?l = all lowercase letters, and ?u = ALL UPPERCASE LETTERS]
- In the actual mask you can refer to this custom character set as ?1 (that’s the number 1)
- You can specify up to 4 custom characters sets for your mask mode [1 – 4]. This is a hard limit. I wish you could do more actually, but that’s how Hashcat is programmed.
- -2 cmiyc_sales_end.hcchr
- Rather than type out the characters for the mask on the command line, you can also save them to a *.hcchr file and read them in.
- This is super helpful when you are targeting special characters that just don’t play well on the command line and you don’t want to mess with escaping them. For example ‘!,$.
- The format for .hcchr files is just all the characters you want to target on the first line. E.g.:
- uncracked_hashes.txt
- Once again, just the hash-list of the hashes you are targeting
- 2023?1?l?l?l?l?l?l?2
- The actual mask to run. Breaking it down further
- 2023: Simply starts every guess with the string “2023”
- ?1: Use the first custom character set. I know, it’s hard to see the difference between the number 1 and the letter l. The above uses the number one. In this case it tries all lower and uppercase letters.
- ?l?l?l?l?l?l: Try 6 lower case characters
- ?2: Try the second custom character set. This appends common special characters I found when cracking other sales passwords.
That’s great, but what if you want to try 5 lower case characters vs. 6. Running these attacks by hand is a pain so it’s nice to queue up a bunch of mask attacks at once using a save mask file (e.g. a .hcmask file). Unfortunately, the format is a bit different so let’s look at how we can do that next. First, here is the hashcat command line to run a .hcmask file:
- hashcat -a 3 -m 0 uncracked_hashes.txt sales.hcmask
You’ll notice that all the mask info has been removed from the command line and instead I’m calling an external sales.hcmask file. Let’s take a look at what’s in that file:
- ?l?u,!\,$,2023?1?l?l?l?2
- ?l?u,!\,$,2023?1?l?l?l?l?2
- ?l?u,!\,$,2023?1?l?l?l?l?l?2
- ?l?u,!\,$,2023?1?l?l?l?l?l?l?2
- ?l?u,!\,$,2022?1?l?l?l?2
- ?l?u,!\,$,2022?1?l?l?l?l?2
- ?l?u,!\,$,2022?1?l?l?l?l?l?2
- ?l?u,!\,$,2022?1?l?l?l?l?l?l?2
Breaking this file format down:
- Each line defines a single mask to run. Lines starting with ‘#’ are comments.
- Each line will be run in order. Generally it helps to put the quick masks first so if you decide to cancel the job you have a better idea of how much key-space you checked.
- I know, I didn’t follow my own advice in this example…
- Each line must define any custom character sets, and unlike with the command line you can’t define them in external files.
- Each custom character set (up to 4) are specified by putting a comma ‘,‘ after them.
- In the above example this means the 2 custom character sets are:
- For the second custom character set I wanted to include a comma, which is a problem because it’s a deliminator. So I needed to escape it with a backslash. Aka: ‘\,‘
- You can read more about the hcmask file format here.
With all of that, I managed to identify a couple more base words to use targeting sales passwords. This in turn allowed me to target higher value hashes easier. The same can be done by targeting known words to find the mangling rules. E.g.:
Yes you can also do that with a wordlist and mangling rules, but if you only have a couple of words you want to check it can sometimes be easier to do that with Masks instead. Now if you have a lot of words you want to try, then you can look into Hashcat’s “-a 6” (Wordlist + Mask) and “-a 7” (Mask + Wordlist) attack modes. John the Ripper doesn’t have this specifically because *cough cough* its rule preprocessor supports masks already in its normal mangling rules. But these attack modes can be very helpful if you are using Hashcat.
One thing you’ll notice though with the hybrid -a [6/7] attacks is that you can’t mangle or apply masks to both sides of a guess at the same time. Also, unlike with standard wordlist modes (-a 0) you can not pipe a wordlist in to -a [6/7] modes via stdin. This is a problem. The whole reason you are using Masks is probably because you don’t know what mangling rules have been applied to the base-word.
The key then is to create custom word-lists that contain one side of the mangling rules. I’d recommend picking the “shorter” of the mangling rules to limit how much you write to disk. This is super annoying, but it works. So for example if you want to append 2022 and 2023 to a word and then append a mask attack you could do something like first creating a word-list containing all the words with 2022 and 2023 appended to them (this only doubles the size of the original input dictionary). In this case I’m accomplishing this by using Hashcat’s rules and saving the results to disk. To do that, and the run the resulting full Mask attack, you can use the following commands:
Rule file: append_year.rule (Capitalize word and prepend 2022 and 2023).
Generate wordlist command:
- hashcat -a 0 –stdout ./sales_words.txt -r append_year.rule
Now that we have a wordlist containing words like 2023Sales, run the mask hybrid attack:
- hashcat -a 6 -m 0 -1 ?l?u uncracked_hashes.txt ./sales_words.txt ‘?1?1?1
Is all of this a pain? Absolutely! But it can be very effective so it’s usually worth creating these temporary wordlists for your attacks and then combine them with masks.
Hashcat Association Attacks (Getting Big Points with BCrypt)
As mentioned earlier, the whole reason to try different “spray and pray” attacks against fast hashes is to crack enough to identify how the passwords were created and develop highly targeted attacks against expensive and high value hashes like BCrypt. The mangling rule that received the most post-contest conversation among all of the teams was that several users’ passwords were their creation time (found in their metadata) converted to Unix epoch timestamps.
Creating a wordlist of all the various timestamps is certainly one way to go, but what we really want to do is crack bcrypt hashes. This is a perfect opportunity to talk about association (-a 9) attacks in Hashcat. Association attacks take one word per hash and target that hash with it. The word in association attacks can be combined with rules as well. This is a huge improvement when targeting a large number of salted hashes where you may have some idea what the plaintext for each account might be.
To perform an association attack you need to create a hashlist of the hashes you want to target, and then have a 1 to 1 mapping to a wordlist you want to target those hashes with. So for example you might have two files:
HashList.txt:
- user1:$2a$:<rest of the hash here>
- user2:$2a$:<rest of the hash here>
- user3:$2a$:<rest of the hash here>
Wordlist.txt:
For this particular challenge I created the wordlists + uncracked bcrypt hashlist using the following python script in Jupyter Notebook:
Next, let’s run some attacks. First, let’s just do a quick naïve attack using (-a 0) and the timestamps as a normal wordlist.
Running this attack for an hour and a half isn’t the end of the world. But this is a contest. You are a busy hacker. You have hashes to crack and other wordlists to run. Let’s try Hashcat’s association attack. Here is the command I ran:
- hashcat -o cmiyc2023_hc.potfile -a 9 -m 3200 bcrypt_datetime.txt unix_timestamps_bcrypt.txt
ONE IMPORTANT THING TO KNOW: By default ‘-a 9’ mode will not save to your standard .potfile. So if you want to capture these hashes you MUST specify a potfile on the command line using the ‘-o FILENAME’ option. I learned this fact the hard way when none of my cracks were showing up. I asked some Hashcat developers about this and they said there’s still some “weirdness” with ‘-a 9’ mode. For example, it will “recrack” hashes you have already cracked and post duplicates cracks/plaintexts to your potfile. So if you are running this attack it is probably good to run it on a new potfile vs. your global one, and then merge the new cracks back into your main potfile after the fact.
And here’s the results:
Over 100 Bcrypt hashes cracked in a couple of seconds! That’s super fun. As some backstory, association attacks are amazing if you have known passwords for users. Aka you obtained passwords from a different password dump and you are attacking the fact that users re-use password between multiple sites. Leveraging association attacks, you can run common mangling attacks against those known passwords to crack computationally expensive hashes for a subset of users.
Cracking Multi-Words With Hashcat
The next area to focus on is multi-words and phrases. Korelogic gave out a hint during the contest that several of the Engineering passwords were created from phrases taken from sci-fi books and movies, with the number ‘1’ appended on the end [
Link]. This can be seen in some of the cracks I made earlier:
Going back to the hash breakdown by department, Engineering is also a huge department to target:
The approach here then is to crack as many hashes as possible with fast hashing algorithms to try and figure out the source materials. Then we need to target high-value hashes in the engineering department using phrases from those source materials. Basically dumb, untargeted attacks first, then smart attacks later. Let’s start with those dumb untargeted attacks!
At a high level this looks like a
Correct Hose Battery Staple problem. To target that, let’s try all the common English words in two and three word phrases and add the number ‘1’ to the end. For a dictionary we can use the following corpus which contains various word-lists of 10k English words sorted in probability order [
Link]. The first really “just get it to work” option I selected was to write a quick python program that loops through the word-list and outputs possible phrases while appending the number 1 to them. I then used the fact that if you do not specify a dictionary, Hashcat’s ‘-a 0’ mode will read in words from stdin. So I can run my attack using the following command:
- (Editor note: This option is bad. Keep reading for a better one) python3 word_combinator.py | hashcat -a 0 -m 0 uncracked_hashes.txt
This wasn’t pretty, but it did crack a number of hashes. Still, my guess generation was super slow as it is running a slow python script and then pipes those guesses into hashcat (piping guesses is also slow). Raw-MD5 is fast to compute. Basically this option wastes a lot of time and limits the key-spaces I can search. How about we speed this up using Hashcat’s combinator attack?
Hashcat’s combinator attack ‘-a 1’ allows you to combine two dictionaries together to target multi-word passwords. For example, let’s assume you have the following two word-lists
dic1.txt
If you run the following command:
- hashcat –stdout -a 1 dict1.txt dict2.txt
You’ll get the following output:
- fluffycat
- fluffybat
- fluffyrat
- scarycat
- scarybat
- scaryrat
- cuddlycat
- cuddlybat
- cuddlyrat
You can also apply one (AND ONLY ONE) rule to each dictionary if you want using the ‘-j’ (applied to left word list) and ‘-k’ (applied to right word list). So for example if you use the following command:
- hashcat –stdout -a 1 -j ‘$ ‘ -k ‘$1’ dict1.txt dict2.txt
It’ll create the following guesses
- fluffy cat1
- fluffy bat1
- fluffy rat1
- <you get the idea>
As reference the ‘$‘ rule appends a character to the end of a guess. So ‘$ ‘ appends a space, and ‘$1‘ appends a ‘1‘. I think you might see where this is going….
The problem is, this works great for two word phrases. But what about three and four word phrases? I wish I knew of a better solution, but the short answer is I hope your cracking system has some free hard-drive space! You can only use combinator with two input dictionaries, and you can’t pipe in guesses into hashcat if you are using ‘-a 9’ mode. The fastest option then is to create a word-list of all two word phrases. If you don’t want to write a custom program to do this, you can always use hashcat and pipe the guesses to a file. For example:
- hashcat –stdout -a 1 -j ‘$ ‘ english_words.txt english_words.txt > two_wordst.txt
Then to try three words you can run
- hashcat -m 0 -a 1 -j ‘$ ‘ -k’$1 ‘ uncracked_hashes.txt two_words.txt english_words.txt
To try four words you can simply run
- hashcat -m 0 -a 1 -j ‘$ ‘ -k’$1 ‘ uncracked_hashes.txt two_words.txt two_words.txt
Side note, I also has success by capitalizing the first letter by changing the -j rule to:
This attack yielded a ton of cracks. Looking through them I started trying to find “unique” and “odd” phrases to try and figure out where the source material came from. This is because while the above attack works great against fast hashes like raw-md5, they will not scale against slow hashes like Bcrypt. We need to further optimize our attacks. Given that, here is a subsection of my cracked passwords:
Most of these phrases were spectacularly unhelpful. But some of them stood out such as ‘watch your food’. Running a quick google search on that + the “scifi” highlighted Project Hail Mary [
link]. That was a book I loved and hated in equal parts so it brought up a number of mixed feelings, but it certainly seems like a good candidate. The challenge is that the book isn’t in the public commons. Still, let’s try and create a dictionary of quotes copied from that article.
Next step was to create a janky Python program that would output all 2, 3, and 4 word phrases from the book paragraphs I had found. I know janky Python programs are slow, but so is cracking Bcrypt hashes. In this case it is better to minimize the number of guesses I make vs. focusing on how fast those guesses are generated.
Side note: I apologize for putting this as a screenshot. I really wish Google’s blogger had a code insert option…
Running this through hashcat again yielded a new cracked hash!
That’s also a pretty unusual phrase, so I have high confidence that Project Hail Mary is one of the sources for the plain-texts. Let’s try this against the bcrypt hashes!!!!!
Annnnd nothing cracked…….
This was disappointing, but it’s probably because I was only using two paragraphs from the book. I need to find a better source to grab quotes from.
Let me take a step back and say, this workflow loop is one of the keys to this contest. If the cracked fast hashes (raw-md5, raw-sha1, etc) are any indication, around 1/3rd of the high value hashes are phrases taken from books and movies.
Key workflow for CMIYC 2023:
- Find the source material for passphrases by analyzing your cracks against fast hashes
- Create input dictionaries by scraping webpages of book and movie quotes and screenplays
- Run those input dictionaries against the slow high-value hashes.
- Repeat
The problem for me is that workflow is manually intensive, time consuming, and quite frankly boring as hell. During a competition it can be fun to get that dopamine hit as you crack new bcrypt hashes. After the contest, I’m simply wasting time while running up my power bill. So the question is, can I automate this at all? My power bill will still be high, but at least then I can watch new episodes of Asohka vs. staring at my computer screen! How about I train my PCFG guess generator on cracked passphrases and let it crunch away at generating guesses? I mean, it worked for the Hashcat team! [Link].
There’s various ways to create the training set, but given how Korelogic generated these passwords, and the plain-text values I was seeing, I just threw everything that had a ‘space’ into a training file using the following command line:
- cat cmiyc2023_.potfile | grep ‘ ‘ | awk -F ‘:’ ‘{print $2}’ > passphrase_cracked.txt
I know, I could have done the word-list generation much better as a short python script in my Jupyter Notebook, but I got places to be and Starwars episodes to watch! Now that I had a good training set, I then trained a PCFG grammar on it using the following command:
- python3 ../../repos/pcfg_cracker/trainer.py -c 1 -r CMICY23_Passphrase -t passphrase_cracked.txt
I set coverage (-c) to be 1 so the PCFG guesser will not generate any brute force (OMEN) guesses. I then gave this attack a test run against raw-sha256 hashes using the following command:
- python3 ../../repos/pcfg_cracker/pcfg_guesser.py -r CMICY23_Passphrase | hashcat -m 1400 -a 0 uncracked_hashes.txt
And…. Yup this looks promising:
Let’s see how it does with Bcrypt using the following command:
- python3 ../../repos/pcfg_cracker/pcfg_guesser.py -r CMICY23_Passphrase | hashcat -m 3200 -a 0 uncracked_hashes.txt
Success! Limited Success!
There is still a ton of optimization I could do. You’ll notice I haven’t re-added / merged my potfiles in from the previous cracking of the Unix Epoch timestamp hashes. I also am targeting all of the Bcrypt hashes vs. just the ones in the engineering department. By reducing the target hashes I could easily double the speed of plain-text guesses I am making against the target hash list. I also don’t want to give the false impression that this is the best attack method for these hashes. It’s not. You would be much more successful by trying to find the source material and creating custom word-lists from that. What this attack workflow has going for it though is it is one of the most automatable options. You can let this run while trying to figure out better methods. Or… you can go do something else besides crack passwords. Call you parents maybe? I’m sure they would appreciate it!
I think this is a good spot to end this blog post. Looking back at it, I somehow managed to cover every attack mode in Hashcat. There’s still more techniques to dig into, and there’s a ton of uncracked hashes left in this contest. But I might leave that for a future post. If you have any tips, suggestions, or comments, feel free to leave them in the comments. Good luck, and I hope to see everyone at CMIYC 2024! Also thanks once again to the KoreLogic team for putting together such a great contest!