C-based hash matching cracker

Question 1

How can I improve this hash matching program using either abstraction or search/sort algorithms?

We briefly were introduced to searching methods such as bubble sort and merge sort, but I'm not sure how they can be implemented to improve my code.

The program takes a command-line hash input, compares it against generated hashes, and if it finds a match, prints out the key used to generate the matched hash.

(This is part of Harvard's CS50 on eDx Week 2 Problem Set)

#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <crypt.h>
#define _XOPEN_SOURCE
#include <unistd.h>
/*
 * Goal: Program that takes cmd ln arg of password hash
 * and compares it to program generated hashes until a
 * match is found. 
 * Assumptions: password is max 5 chars && all chars alpha
*/
bool crack(string hash);
int main(int argc, string argv[])
{
 // Check for 2 command line arguments
 if (argc != 2)
 {
 printf("[ERROR 1] Usage: ./crack hash\n");
 return 1;
 }
 string hash = argv[1];
 if (!crack(hash))
 {
 printf("[ERROR 2] Could not crack! \n");
 return 1;
 }
}
bool crack(string hash)
{
 string alphas = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 char key[6];
 char salt[3];
 salt[0] = hash[0];
 salt[1] = hash[1];
 salt[2] = '0円';
 // 1 char pw test
 for (int i = 0; i < strlen(alphas); i++)
 {
 // Iterate over alpha chars to generate test key
 key[0] = alphas[i];
 key[1] = '0円';
 // Create new test hash
 string crackhash = crypt(key, salt);
 // Compare hashes
 if(strcmp(crackhash, hash) == 0)
 {
 printf("%s \n", key); // Array of chars is a string
 return true;
 }
 }
 // 2 char pw test
 for (int i = 0; i < strlen(alphas); i++)
 {
 key[0] = alphas[i];
 for (int j = 0; j < strlen(alphas); j++)
 {
 key[1] = alphas[j];
 key[2] = '0円';
 char* crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0)
 {
 printf("%s \n", key);
 return true;
 }
 }
 }
 // 3 char pw test
 for (int i = 0; i < strlen(alphas); i++)
 {
 key[0] = alphas[i];
 for (int j = 0; j < strlen(alphas); j++)
 {
 key [1] = alphas[j];
 for (int k = 0; k < strlen(alphas); k++)
 {
 key[2] = alphas[k];
 key[3] = '0円';
 char* crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0)
 {
 printf("%s \n", key);
 return true;
 }
 }
 }
 }
 // 4 char pw test
 for (int i = 0; i < strlen(alphas); i++)
 {
 key[0] = alphas[i];
 for (int j = 0; j < strlen(alphas); j++)
 {
 key [1] = alphas[j];
 for (int k = 0; k < strlen(alphas); k++)
 {
 key [2] = alphas[k];
 for (int l = 0; l < strlen(alphas); l++)
 {
 key[3] = alphas[l];
 key[4] = '0円';
 char* crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0)
 {
 printf("%s \n", key);
 return true;
 }
 }
 }
 }
 }
 // 5 char pw test
 for (int i = 0; i < strlen(alphas); i++)
 {
 key[0] = alphas[i];
 for (int j = 0; j < strlen(alphas); j++)
 {
 key [1] = alphas[j];
 for (int k = 0; k < strlen(alphas); k++)
 {
 key [2] = alphas[k];
 for (int l = 0; l < strlen(alphas); l++)
 {
 key[3] = alphas[l];
 for (int m = 0; m < strlen(alphas); m++)
 {
 key[4] = alphas[m];
 key[5] = '0円';
 char* crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0)
 {
 printf("%s \n", key);
 return true;
 }
 }
 }
 }
 }
 }
 return false;
}

Question 2

Do not edit your question once an answer has been posted. (See What should I do when someone answers my question?. If you have a new problem with updated code, ask a new question.

Question 3

Iterating over all possible keys

You have the problem that you need to nest more and more loops for longer passwords. Instead of having nested for-loops, where each loop has an iterator that represents one character of the key, you should treat the whole key as one single iterator. For example, with a key of length 5, the lowest value of that iterator is "aaaaa", and the highest is "ZZZZZ". Now think of how you write a regular for-loop, and instead of int i, use char key[] as the iterator variable. Then, you have to check whether the key is valid; let's assume we will set it to the empty string if we reached the end. At the end of the loop we have to increment our iterator. That's too much work to do inside a single for-statement, but we can write functions for it. So the for-loop should look like:

for(char key[] = "aaaaa"; key[0] != '0円'; increment(key)) {
 char *crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0) {
 printf("%s\n", key);
 return true;
 } 
}

The function increment() should look like this:

void increment(char *key) {
 static const char *alphas = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 /* Increment the first character of key.
 If it was 'Z', then reset the first character to 'a',
 and continue with the next character, otherwise exit.
 If every character is 'Z', make the key an empty string.
 */
 for (int i = 0; key[i] != '0円'; i++) {
 char *value = strchr(alphas, key[i]);
 ptrdiff_t pos = value - alphas;
 pos++;
 if (alphas[pos]) {
 key[i] = alphas[pos];
 break;
 } else {
 if (key[i + 1] == '0円') {
 key[0] = '0円';
 } else {
 key[i] = alphas[0];
 }
 }
 } 
}

The first time increment() is called with a key of length 5, key will be "aaaaa". The function will then look at the first character, and look up where in the string alphas this character appears. Since 'a' is the first character in alphas, pos will be 0. It then increments pos to 1, and will write back alphas[pos], which is 'b', to the first position in key. Thus, the contents of key will then be "baaaa". At this point, it breaks out of the for-loop and returns. The next time increment() will be called, it will set key to "caaaa", and so on, until it is "Zaaaa". When increment() is called with that value of the key, it will notice that after incrementing pos, alphas[pos] is the NUL character at the end of the the string alphas. It then sets the first character of key to the first character in alphas ('a'), and instead of breaking out of the for loop, it will continue with the second character of 'key'. Here it does exactly the same as it did for the first character, so it sees an 'a' and turns it into a 'b'. So after "Zaaaa", key will now become "abaaa". Next time it will again only increment the first character, until it becomes "Zbaaa", then the next key will be "acaaa", and so on until it reaches "ZZaaa". At that time, the for-loop goes all the way to the third character in key, and the next value will be "aabaa". This goes on and on until key is "ZZZZZ"; when increment() is called with this value, it will reach the point where it sees that there is no sixth character to increment (key[i + 1] == '0円' is true), and will then set the first character of key to the NUL byte, which means key will become an empty string. This is the signal that it has completed iterating over all possible keys.

Note how we now just have two for loops, one in increment() and one in the main function. Also note that nowhere did we actually have to specify the length of the key, so the above code works for any length of key! So we can write just one more, outer for-loop to check keys of all possible sizes:

char key[MAX_LEN + 1] = "";
for(int len = 1; len <= MAX_LEN; len++) {
 for(memset(key, 'a', len); key[0] != '0円'; increment(key)) {
 char *crackhash = crypt(key, salt);
 ...
 }
}

The call to memset() is used to set the initial value for the key. It sets the first len bytes of key to the value 'a'. So with len being 1, this results in key becoming the string "a". If len is 2, then key will become "aa", and so on. It uses the fact that all the other bytes have already been initialized to zero by char key[MAX_LEN + 1] = "".

Question 4

I don't know if this has a name. It's just for-loops. The only thing special is that the iterator variable is a string, and incrementing a string is just something you don't do often. Maybe it is analogous to en.wikipedia.org/wiki/Arbitrary-precision_arithmetic.

Question 5

never seen MAX_LEN before, can you explain the usage in this context? It's giving errors.

Question 6

It's just a placeholder for the maximum length of keys you want to test.

Question 7

Can you articulate what's going on in the increment() function? It works, but this seems cryptic for someone new to CS and C. Also, can you explain the use of memset(key, 'a', len) in this context?

Question 8

`stderr`

I don't know the requirements of the problem, but in a real program, error messages should be written to stderr:

fprintf(stderr, "[ERROR] Whatever\n");

`char *`

If CS50 allows it, use char * instead of string. It's the same (CS50 writes a typedef in <cs50.h>), but you will have it more present.

Thanks to @TobySpeight for noting that it's not only something visual, but also the typedef (string) has problems when mixed with const (the behaviour is non-obvious; const is applied to the pointer, and not to the variable type).

Don't leave whitespace in a line ending

printf("%s \n", key);

This code will write a space at the end of a line, which is useless, and sometimes not good at all.

DRY

Don't repeat yourself: the 5 loops are very similar. In fact the inner loop of each one is identical and could be put inside a function. Repetition usually leads to typos and bugs.

Compiler specific optimizations

AFAIK, CS50 uses Clang, which can apply optimizations beyond what you could write in standard C. To do that, you need to write code that only some compilers will understand (you can write that code inside #ifs to support other compilers, or decide that your code will only support a specific set of compilers). GCC, Clang, and possibly other compilers make use of function attributes to tell the compiler that a function is able to be optimized more than what it may think.

For example, @chux told in his answer that good compilers know that even if you write strlen many times, as long as the string isn't modified in between, the compiler can reuse the length from the last call. The compiler uses one of those attributes to know that (it isn't black magic). This is what the prototype for strlen might look like in Clang's libc:

__attribute__((pure))
size_t strlen(const char *s);

Your function crack has the same property: if you call it twice without modifying the string, the return value will be the same (and it has no side effects). You could write your prototype as:

__attribute__((pure))
bool crack(const char *hash);

You could write a macro to support compilers that don't understand __attribute__:

#if defined(__GNUC__)
#define pure__ __attribute__((pure))
#else
#define pure__
#endif

If you want other compilers to be able to optimize your functions as pure but use different constructions for that, you can extend this macro. The usage would be:

pure__
bool crack(const char *hash);

Note: there are other attributes that would be applicable to that function. I leave it to you to find them (read the link above).

`static`

Functions that are private to a .c file should be declared static. crack is one of those:

__attribute__((pure))
static
bool crack(const char *hash);
...
static
bool crack(const char *hash)
{
 ...
}

Question 9

Why do you say to prefer char*?

Question 10

@Josiah Because it's the reality. string doesn't exist in C.

Question 11

@Josiah The bool variable type also doesn't exist in C unless stdbool.h is included.

Question 12

@pacmaninbw But _Bool exists always; it's not the same (since C99, of course). It would have been a good comparison 21 years ago, but I never use C89, and I would recommend everyone to do the same. Even in Linux they are trying to move to C11 :)

Question 13

To clarify, I meant the Linux Kernel code, which now uses GNU C89 (gnu89), but they want to move to gnu11. In Linux, using GCC, C17 is almost fully supported. @pacmaninbw I only wrote one C program before 1999, and none before 1989 (I was born in '93), so I practically learnt C already with bool (lucky me); it's very sad that Windows has such bad support for C (not only C99, but C), many decades after.

Question 14

I agree with everything @CacahueteFrito mentioned in their answer. There is one observation there I would like to expand on, and one observation I would like to add.

Use the Type size_t When Indexing Arrays
Each of the for loops in the function bool crack(char *hash) uses a type int variable as the loop control variable. When indexing into arrays, it is better to use the type size_t which is defined as unsigned int in both C and C++. The C library functions strlen and sizeof return the type size_t.

One of the benefits of using an unsigned index is that a negative number can not be used to index an array. Since arrays in C start at zero this is important and prevent bugs. A second possible benefit of using size_t is that it may prevent warning messages about type mismatches from some compilers when using functions such as strlen or sizeof.

Complexity of Functions
When @CacahueteFrito mentions the DRY Programming Principle he is providing a fix for function complexity. There is another programming principle that is involved here as well, the Single Responsibility Principle. The single responsibility principle states

... every module, class, or function should have responsibility over a single part of the functionality provided by the software.

While the single responsibility principle primarily applies to object oriented programming, the function part of the statement refers to non object oriented programming as well. Well written functions will have a single goal to accomplish and will generally be short. In the code there are comments of the form // 2 char pw test, each of the code blocks following these comments could be a function. Well named functions here would reduce the need for these comments and the overall effect would be a much simpler version of the function `bool crack(char* hash).

Small simple functions are much easier to debug and maintain then functions that are over 100 lines long. This would also decrease the scope of variables such as char key[6].

Question 15

I agree in everything but the type. I agree that there is (actually, are) a specialized type for arrays, though. To avoid giving my subjective point of view, here are both types with their pros and cons: ptrdiff_t (signed) vs size_t (unsigned). But the point is there: don't use int for accessing arrays.

Question 16

@CacahueteFrito nice link! One gets both views from the answers.

Question 17

Appreciate the articulation. My goal was to have working code, and then see how I can optimize it from there. As for DRY principles, I am going to start with the easiest abstraction here which is the part where it generates a crackhash and tests versus the hash into a function. The harder part conceptually, for me as a novice programmer, is how I can write a function that generates another char in the key, and adds the next loop until the length of the key reaches a certain point.

Question 18

size_t (and std::size_t) is an unsigned type, but not necessarily unsigned int. That's kind of the point: it can be unsigned long or unsigned long long as appropriate for the process's address space.

Question 19

Below can lead to terrible performance.

A good compiler will recognize alphas does not change and so perform strlen(alphas) only once - which time complexity is the length of the string - say N.

A lesser compiler will execute strlen(alphas) each and every time, thus the 5 char pw test loops will cost N*N*N*N*N time!

// 5 char pw test
for (int i = 0; i < strlen(alphas); i++)
{
 key[0] = alphas[i];
 for (int j = 0; j < strlen(alphas); j++)
 {
 ....

Instead simply use

size_t n = strlen(alphas);
for (size_t i = 0; i < n; i++)
{
 key[0] = alphas[i];
 for (size_t j = 0; j < n; j++)
 {
 ....

Or idiomatic C and look for the null character.

for (size_t i = 0; alphas[i]; i++)
{
 key[0] = alphas[i];
 for (size_t j = 0; alphas[j]; j++)
 {
 ....

Code only needed once. Recommend to move key[N] = '0円'; to just before each set of loops.

 key[5] = '0円';

Minor: Use const

Some compiler will emit more efficient code when told that hash does not point to changeable data. Some compilers may deduce this already. Also see @Toby Speight

// bool crack(string hash)
bool crack(const char *hash)

Question 20

Awesome stuff here. These are optimizations I was hoping to learn. The 5 char password hashes were indeed taking a significant amount of time to crack. Going to implement these changes and note the difference in run time. Can you elaborate on moving the null character for the key array to outside the loops. Since each iteration is is a different length (example one password is OK another is YES), how would you tell it to be at the end regardless of the number of chars in the cracked key.

Question 21

@stiction Every iteration within // 5 char pw test for (int i = 0; i < strlen(alphas); i++) is 5 so key[5] = '0円'; can precede those loops. Every iteration within // 4 char pw test for (int i = 0; i < strlen(alphas); i++) is 4 so key[4] = '0円'; can precede those loops. ...

Question 22

Typo: for (size_t j = 0; alphas[i]; j++) -> for (size_t j = 0; alphas[j]; j++)

Question 23

@stiction CS50 uses Clang (AFAIK), which is a good compiler; you shouldn't notice any difference with the strlen calls. In fact, what Clang (probably) uses to know that it only needs to call strlen once is something special: __attribute__((pure)). I will add that to my answer, because you can apply it to your function.

Question 24

const string is really char *const, not char const* (thus demonstrating the problems with that typedef).

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2019-08-06 20:45:16Z

Iterating over all possible keys

You have the problem that you need to nest more and more loops for longer passwords. Instead of having nested for-loops, where each loop has an iterator that represents one character of the key, you should treat the whole key as one single iterator. For example, with a key of length 5, the lowest value of that iterator is "aaaaa", and the highest is "ZZZZZ". Now think of how you write a regular for-loop, and instead of int i, use char key[] as the iterator variable. Then, you have to check whether the key is valid; let's assume we will set it to the empty string if we reached the end. At the end of the loop we have to increment our iterator. That's too much work to do inside a single for-statement, but we can write functions for it. So the for-loop should look like:

for(char key[] = "aaaaa"; key[0] != '0円'; increment(key)) {
 char *crackhash = crypt(key, salt);
 if(strcmp(crackhash, hash) == 0) {
 printf("%s\n", key);
 return true;
 } 
}

The function increment() should look like this:

void increment(char *key) {
 static const char *alphas = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 /* Increment the first character of key.
 If it was 'Z', then reset the first character to 'a',
 and continue with the next character, otherwise exit.
 If every character is 'Z', make the key an empty string.
 */
 for (int i = 0; key[i] != '0円'; i++) {
 char *value = strchr(alphas, key[i]);
 ptrdiff_t pos = value - alphas;
 pos++;
 if (alphas[pos]) {
 key[i] = alphas[pos];
 break;
 } else {
 if (key[i + 1] == '0円') {
 key[0] = '0円';
 } else {
 key[i] = alphas[0];
 }
 }
 } 
}

The first time increment() is called with a key of length 5, key will be "aaaaa". The function will then look at the first character, and look up where in the string alphas this character appears. Since 'a' is the first character in alphas, pos will be 0. It then increments pos to 1, and will write back alphas[pos], which is 'b', to the first position in key. Thus, the contents of key will then be "baaaa". At this point, it breaks out of the for-loop and returns. The next time increment() will be called, it will set key to "caaaa", and so on, until it is "Zaaaa". When increment() is called with that value of the key, it will notice that after incrementing pos, alphas[pos] is the NUL character at the end of the the string alphas. It then sets the first character of key to the first character in alphas ('a'), and instead of breaking out of the for loop, it will continue with the second character of 'key'. Here it does exactly the same as it did for the first character, so it sees an 'a' and turns it into a 'b'. So after "Zaaaa", key will now become "abaaa". Next time it will again only increment the first character, until it becomes "Zbaaa", then the next key will be "acaaa", and so on until it reaches "ZZaaa". At that time, the for-loop goes all the way to the third character in key, and the next value will be "aabaa". This goes on and on until key is "ZZZZZ"; when increment() is called with this value, it will reach the point where it sees that there is no sixth character to increment (key[i + 1] == '0円' is true), and will then set the first character of key to the NUL byte, which means key will become an empty string. This is the signal that it has completed iterating over all possible keys.

Note how we now just have two for loops, one in increment() and one in the main function. Also note that nowhere did we actually have to specify the length of the key, so the above code works for any length of key! So we can write just one more, outer for-loop to check keys of all possible sizes:

char key[MAX_LEN + 1] = "";
for(int len = 1; len <= MAX_LEN; len++) {
 for(memset(key, 'a', len); key[0] != '0円'; increment(key)) {
 char *crackhash = crypt(key, salt);
 ...
 }
}

The call to memset() is used to set the initial value for the key. It sets the first len bytes of key to the value 'a'. So with len being 1, this results in key becoming the string "a". If len is 2, then key will become "aa", and so on. It uses the fact that all the other bytes have already been initialized to zero by char key[MAX_LEN + 1] = "".

I don't know if this has a name. It's just for-loops. The only thing special is that the iterator variable is a string, and incrementing a string is just something you don't do often. Maybe it is analogous to en.wikipedia.org/wiki/Arbitrary-precision_arithmetic.
never seen MAX_LEN before, can you explain the usage in this context? It's giving errors.
It's just a placeholder for the maximum length of keys you want to test.
Can you articulate what's going on in the increment() function? It works, but this seems cryptic for someone new to CS and C. Also, can you explain the use of memset(key, 'a', len) in this context?

Stack Exchange Network

C-based hash matching cracker

4 Answers 4

Iterating over all possible keys

`stderr`

`char *`

Don't leave whitespace in a line ending

DRY

Compiler specific optimizations

`static`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C-based hash matching cracker

4 Answers 4

Iterating over all possible keys

stderr

char *

Don't leave whitespace in a line ending

DRY

Compiler specific optimizations

static

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`stderr`

`char *`

`static`