Self-contained SHA-256 implementation in C

Question 1

I have been looking for a SHA-256 implementation in C with no dependencies (preferably self-contained in a single C-file) with a permissive license but found none to my liking.

Since the Wikipedia pseudo-code looked extremely well-specified, and since generating test vectors is so easy with sha256sum, I decided to make my own implementation and release it in the public domain on GitHub.

My design criteria are the following:

The implementation must be correct (obviously). To achieve this, I applied a simple strategy: assuming the Wikipedia pseudo-code was correct, I made a straightforward implementation, pretty much word for word. I then tested it with several test vectors of relevant lengths (see the test program).
Since I release the implementation on GitHub in the public domain, I want it to be portable: make no assumption of target platform word size or endianness, and be tolerant towards older compilers (i.e. ANSI C with as little specific C99 as possible).
The implementation should preferably be thread-safe (which I think I have achieved).

I have no particular performance criteria, but I am interested in hearing about any obvious optimizations (among other comments).

The C-file is:

#include <stdint.h>
#include <string.h>
#include "sha-256.h"
#define CHUNK_SIZE 64
#define TOTAL_LEN_LEN 8
/*
 * Comments from pseudo-code at https://en.wikipedia.org/wiki/SHA-2 are reproduced here.
 * When useful for clarification, portions of the pseudo-code are reproduced here too.
 */
/*
 * Initialize array of round constants:
 * (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
 */
static const uint32_t k[] = {
 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};
struct buffer_state {
 const uint8_t * p;
 size_t len;
 size_t total_len;
 int single_one_delivered;
 int total_len_delivered;
};
static inline uint32_t right_rot(uint32_t value, unsigned int count)
{
 /*
 * Defined behaviour in standard C for all count where 0 < count < 32,
 * which is what we need here.
 */
 return value >> count | value << (32 - count);
}
static void init_buf_state(struct buffer_state * state, const void * input, size_t len)
{
 state->p = input;
 state->len = len;
 state->total_len = len;
 state->single_one_delivered = 0;
 state->total_len_delivered = 0;
}
static int calc_chunk(uint8_t chunk[CHUNK_SIZE], struct buffer_state * state)
{
 size_t space_in_chunk;
 if (state->total_len_delivered) {
 return 0;
 }
 if (state->len >= CHUNK_SIZE) {
 memcpy(chunk, state->p, CHUNK_SIZE);
 state->p += CHUNK_SIZE;
 state->len -= CHUNK_SIZE;
 return 1;
 }
 memcpy(chunk, state->p, state->len);
 chunk += state->len;
 space_in_chunk = CHUNK_SIZE - state->len;
 state->p += state->len;
 state->len = 0;
 /* If we are here, space_in_chunk is one at minimum. */
 if (!state->single_one_delivered) {
 *chunk++ = 0x80;
 space_in_chunk -= 1;
 state->single_one_delivered = 1;
 }
 /*
 * Now:
 * - either there is enough space left for the total length, and we can conclude,
 * - or there is too little space left, and we have to pad the rest of this chunk with zeroes.
 * In the latter case, we will conclude at the next invokation of this function.
 */
 if (space_in_chunk >= TOTAL_LEN_LEN) {
 const size_t left = space_in_chunk - TOTAL_LEN_LEN;
 const size_t len = state->total_len * 8;
 memset(chunk, 0x00, left);
 chunk += left;
#if SIZE_MAX > UINT32_MAX
 chunk[0] = (uint8_t) (len >> 56);
 chunk[1] = (uint8_t) (len >> 48);
 chunk[2] = (uint8_t) (len >> 40);
 chunk[3] = (uint8_t) (len >> 32);
#else
 chunk[0] = 0;
 chunk[1] = 0;
 chunk[2] = 0;
 chunk[3] = 0;
#endif
#if SIZE_MAX > UINT16_MAX
 chunk[4] = (uint8_t) (len >> 24);
 chunk[5] = (uint8_t) (len >> 16);
#else
 chunk[4] = 0;
 chunk[5] = 0;
#endif
#if SIZE_MAX > UINT8_MAX
 chunk[6] = (uint8_t) (len >> 8);
#else
 chunk[6] = 0;
#endif
 chunk[7] = (uint8_t) len;
 state->total_len_delivered = 1;
 } else {
 memset(chunk, 0x00, space_in_chunk);
 }
 return 1;
}
/*
 * Limitations:
 * - len must be small enough for (8 * len) to fit in len. Otherwise, the results are unpredictable.
 * - sizeof size_t is assumed to be either 8, 16, 32 or 64. Otherwise, the results are unpredictable.
 * - Since input is a pointer in RAM, the data to hash should be in RAM, which could be a problem
 * for large data sizes.
 * - SHA algorithms theoretically operate on bit strings. However, this implementation has no support
 * for bit string lengths that are not multiples of eight, and it really operates on arrays of bytes.
 * the len parameter is a number of bytes.
 */
void calc_sha_256(uint8_t hash[32], const void * input, size_t len)
{
 /*
 * Note 1: All variables are 32 bit unsigned integers and addition is calculated modulo 232
 * Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 = i = 63
 * Note 3: The compression function uses 8 working variables, a through h
 * Note 4: Big-endian convention is used when expressing the constants in this pseudocode,
 * and when parsing message block data from bytes to words, for example,
 * the first word of the input message "abc" after padding is 0x61626380
 */
 /*
 * Initialize hash values:
 * (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
 */
 uint32_t h0 = 0x6a09e667;
 uint32_t h1 = 0xbb67ae85;
 uint32_t h2 = 0x3c6ef372;
 uint32_t h3 = 0xa54ff53a;
 uint32_t h4 = 0x510e527f;
 uint32_t h5 = 0x9b05688c;
 uint32_t h6 = 0x1f83d9ab;
 uint32_t h7 = 0x5be0cd19;
 /* 512-bit chunks is what we will operate on. */
 uint8_t chunk[64];
 struct buffer_state state;
 init_buf_state(&state, input, len);
 while (calc_chunk(chunk, &state)) {
 uint32_t a, b, c, d, e, f, g, h;
 /*
 * create a 64-entry message schedule array w[0..63] of 32-bit words
 * (The initial values in w[0..63] don't matter, so many implementations zero them here)
 * copy chunk into first 16 words w[0..15] of the message schedule array
 */
 uint32_t w[64];
 const uint8_t *p = chunk;
 int i;
 memset(w, 0x00, sizeof w);
 for (i = 0; i < 16; i++) {
 w[i] = (uint32_t) p[0] << 24 | (uint32_t) p[1] << 16 |
 (uint32_t) p[2] << 8 | (uint32_t) p[3];
 p += 4;
 }
 /* Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array: */
 for (i = 16; i < 64; i++) {
 const uint32_t s0 = right_rot(w[i - 15], 7) ^ right_rot(w[i - 15], 18) ^ (w[i - 15] >> 3);
 const uint32_t s1 = right_rot(w[i - 2], 17) ^ right_rot(w[i - 2], 19) ^ (w[i - 2] >> 10);
 w[i] = w[i - 16] + s0 + w[i - 7] + s1;
 }
 /* Initialize working variables to current hash value: */
 a = h0;
 b = h1;
 c = h2;
 d = h3;
 e = h4;
 f = h5;
 g = h6;
 h = h7;
 /* Compression function main loop: */
 for (i = 0; i < 64; i++) {
 const uint32_t s1 = right_rot(e, 6) ^ right_rot(e, 11) ^ right_rot(e, 25);
 const uint32_t ch = (e & f) ^ (~e & g);
 const uint32_t temp1 = h + s1 + ch + k[i] + w[i];
 const uint32_t s0 = right_rot(a, 2) ^ right_rot(a, 13) ^ right_rot(a, 22);
 const uint32_t maj = (a & b) ^ (a & c) ^ (b & c);
 const uint32_t temp2 = s0 + maj;
 h = g;
 g = f;
 f = e;
 e = d + temp1;
 d = c;
 c = b;
 b = a;
 a = temp1 + temp2;
 }
 /* Add the compressed chunk to the current hash value: */
 h0 = h0 + a;
 h1 = h1 + b;
 h2 = h2 + c;
 h3 = h3 + d;
 h4 = h4 + e;
 h5 = h5 + f;
 h6 = h6 + g;
 h7 = h7 + h;
 }
 /* Produce the final hash value (big-endian): */
 hash[0] = (uint8_t) (h0 >> 24);
 hash[1] = (uint8_t) (h0 >> 16);
 hash[2] = (uint8_t) (h0 >> 8);
 hash[3] = (uint8_t) h0;
 hash[4] = (uint8_t) (h1 >> 24);
 hash[5] = (uint8_t) (h1 >> 16);
 hash[6] = (uint8_t) (h1 >> 8);
 hash[7] = (uint8_t) h1;
 hash[8] = (uint8_t) (h2 >> 24);
 hash[9] = (uint8_t) (h2 >> 16);
 hash[10] = (uint8_t) (h2 >> 8);
 hash[11] = (uint8_t) h2;
 hash[12] = (uint8_t) (h3 >> 24);
 hash[13] = (uint8_t) (h3 >> 16);
 hash[14] = (uint8_t) (h3 >> 8);
 hash[15] = (uint8_t) h3;
 hash[16] = (uint8_t) (h4 >> 24);
 hash[17] = (uint8_t) (h4 >> 16);
 hash[18] = (uint8_t) (h4 >> 8);
 hash[19] = (uint8_t) h4;
 hash[20] = (uint8_t) (h5 >> 24);
 hash[21] = (uint8_t) (h5 >> 16);
 hash[22] = (uint8_t) (h5 >> 8);
 hash[23] = (uint8_t) h5;
 hash[24] = (uint8_t) (h6 >> 24);
 hash[25] = (uint8_t) (h6 >> 16);
 hash[26] = (uint8_t) (h6 >> 8);
 hash[27] = (uint8_t) h6;
 hash[28] = (uint8_t) (h7 >> 24);
 hash[29] = (uint8_t) (h7 >> 16);
 hash[30] = (uint8_t) (h7 >> 8);
 hash[31] = (uint8_t) h7;
}

The GitHub repository also contains the header file as well as a test program.

Update after review

The GitHub repository has been updated after review, and successful extensive testing with the NIST Computer Security Resource Center test vectors is included. Continuous build and testing is also provided.

Question 2

This doesn't answer your code review, but you may find what you're looking for here: create.stephan-brumme.com/hash-library - self-contained, no dependencies, portable, zlib license.

Question 3

@Greg Hewgill, thanks, but that is C++, right? I need a C implementation.

Question 4

Oh, I guess it is. Most of the guts of the implementation is pretty much plain C though, and would be straightforward to change the interface. Just thought it might help your original quest.

Question 5

Well, I guess porting create.stephan-brumme.com/hash-library would work, but I would rather not do that now that I have my own tested implementation, unless some clever reviewer can show me that it is broken.

Question 6

If you are re-inventing something. I would want it tested with well known set of test data before I could use it. di-mgt.com.au/sha_testvectors.html dlitz.net/crypto/shad256-test-vectors

Question 7

Nice implementation. Good attention to shift types.

Use bool. Various members, functions are used only in a boolean fashion. Use bool to add clarity to that use. Yet I suppose this goes against "ANSI C with as little specific C99 as possible".

// int single_one_delivered;
// int total_len_delivered;
// int calc_chunk(uint8_t chunk[CHUNK_SIZE], struct buffer_state * state)
// state->single_one_delivered = 1;
#include <stdbool.h>
bool single_one_delivered;
bool total_len_delivered;
bool calc_chunk(uint8_t chunk[CHUNK_SIZE], struct buffer_state * state)
state->single_one_delivered = true;
...

#if SIZE_MAX > UINT8_MAX is always true, so not needed.
Unnecessary limitation. Requirement that state->total_len * 8; will not overflow is not needed for 32-bit size_t. Suggest simply using 64-bit math for bit width and skip all the #if SIZE_MAX > UINT32_MAX as they impose an unnecessary limit on len.
With uint64_t bit_length, the limits ares now len <= UINT64_MAX/8 and len < SIZE_MAX." - hardly a limitation.
For pre-C99 code, 64-bit math needs a little emulation or reversion to code original limit.
```
uint64_t bit_length = state->total_len;
bit_length *= 8;
...
// or better, use a loop
chunk[0] = (uint8_t) (bit_length >> 56);
chunk[1] = (uint8_t) (bit_length >> 48);
...
chunk[7] = (uint8_t) (bit_length >> 0);
```
Consider function or macro vs. straight line code. As for me I'd replaced h0,h1,... h7 with uint32_t h[8] and uint32_t a, b, c, d, e, f, g, h with uint32_t ah[8] and use loops.
```
//hash[0] = (uint8_t) (h0 >> 24);
//hash[1] = (uint8_t) (h0 >> 16);
//....
//hash[31] = (uint8_t) h7;
foo(&hash[0], h0);
foo(&hash[4], h1);
...
foo(&hash[28], h7);
```

Comments need edit

// All variables are 32 bit unsigned integers and addition is calculated modulo 232

to

// All integers (expect indexes) are 32 bit unsigned integers ... 
// and addition is calculated modulo 2^32.

C99 Optimization that could be conditionally added. restrict tells the complier that hash and input do not overlap allowing various optimization. Although I doubt it will help much here as hash[] is not used until the end.
```
// void calc_sha_256(uint8_t hash[32], const void * input, size_t len)
void calc_sha_256(uint8_t * restrict hash, const void * restrict input, size_t len)
```

[edit] To be clear: speed optimization is primarily of the loop for (i = 16; i < 64; i++) { of which there is little to improve. Code could resort to uint_fast32_t and register, yet these hints really only benefit a weaker compiler.

A simple way to use the entire size_t total_len byte range in saving the bit length without resorting to wider types:

 size_t len = state->total_len;
 chunk[7] = (uint8_t) (len << 3);
 len >>= 5;
 for (i = 7; i-- > 0; ) {
 chunk[i] = (uint8_t) len;
 len >>= 8;
 }

Question 8

Thanks a lot. I also prefer bool, but as mentioned (by you and by me in the repo comments), not all pre-C99 implementations have it. I will add some comments in the code though.

Question 9

Thanks for reminding me that SIZE_MAX is at least 65535. My C reference manual does not say that, but the standard does. I will update the code accordingly.

Question 10

About uint64_t: I know at least one real life compiler that has uint32_t but not uint64_t. No emulation, unfortunately. Your removal of limitation on len would not work for that compiler.

Question 11

Your other proposition to avoid len overflow that does not use wider types is clever, thanks.

chux chux 36.1k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2017-12-14 22:48:11Z

Nice implementation. Good attention to shift types.

Use bool. Various members, functions are used only in a boolean fashion. Use bool to add clarity to that use. Yet I suppose this goes against "ANSI C with as little specific C99 as possible".

// int single_one_delivered;
// int total_len_delivered;
// int calc_chunk(uint8_t chunk[CHUNK_SIZE], struct buffer_state * state)
// state->single_one_delivered = 1;
#include <stdbool.h>
bool single_one_delivered;
bool total_len_delivered;
bool calc_chunk(uint8_t chunk[CHUNK_SIZE], struct buffer_state * state)
state->single_one_delivered = true;
...

#if SIZE_MAX > UINT8_MAX is always true, so not needed.
Unnecessary limitation. Requirement that state->total_len * 8; will not overflow is not needed for 32-bit size_t. Suggest simply using 64-bit math for bit width and skip all the #if SIZE_MAX > UINT32_MAX as they impose an unnecessary limit on len.
With uint64_t bit_length, the limits ares now len <= UINT64_MAX/8 and len < SIZE_MAX." - hardly a limitation.
For pre-C99 code, 64-bit math needs a little emulation or reversion to code original limit.
```
uint64_t bit_length = state->total_len;
bit_length *= 8;
...
// or better, use a loop
chunk[0] = (uint8_t) (bit_length >> 56);
chunk[1] = (uint8_t) (bit_length >> 48);
...
chunk[7] = (uint8_t) (bit_length >> 0);
```
Consider function or macro vs. straight line code. As for me I'd replaced h0,h1,... h7 with uint32_t h[8] and uint32_t a, b, c, d, e, f, g, h with uint32_t ah[8] and use loops.
```
//hash[0] = (uint8_t) (h0 >> 24);
//hash[1] = (uint8_t) (h0 >> 16);
//....
//hash[31] = (uint8_t) h7;
foo(&hash[0], h0);
foo(&hash[4], h1);
...
foo(&hash[28], h7);
```

Comments need edit

// All variables are 32 bit unsigned integers and addition is calculated modulo 232

to

// All integers (expect indexes) are 32 bit unsigned integers ... 
// and addition is calculated modulo 2^32.

C99 Optimization that could be conditionally added. restrict tells the complier that hash and input do not overlap allowing various optimization. Although I doubt it will help much here as hash[] is not used until the end.
```
// void calc_sha_256(uint8_t hash[32], const void * input, size_t len)
void calc_sha_256(uint8_t * restrict hash, const void * restrict input, size_t len)
```

[edit] To be clear: speed optimization is primarily of the loop for (i = 16; i < 64; i++) { of which there is little to improve. Code could resort to uint_fast32_t and register, yet these hints really only benefit a weaker compiler.

A simple way to use the entire size_t total_len byte range in saving the bit length without resorting to wider types:

 size_t len = state->total_len;
 chunk[7] = (uint8_t) (len << 3);
 len >>= 5;
 for (i = 7; i-- > 0; ) {
 chunk[i] = (uint8_t) len;
 len >>= 8;
 }

Thanks a lot. I also prefer bool, but as mentioned (by you and by me in the repo comments), not all pre-C99 implementations have it. I will add some comments in the code though.
Thanks for reminding me that SIZE_MAX is at least 65535. My C reference manual does not say that, but the standard does. I will update the code accordingly.
About uint64_t: I know at least one real life compiler that has uint32_t but not uint64_t. No emulation, unfortunately. Your removal of limitation on len would not work for that compiler.
Your other proposition to avoid len overflow that does not use wider types is clever, thanks.

Stack Exchange Network

Self-contained SHA-256 implementation in C

Update after review

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Self-contained SHA-256 implementation in C

Update after review

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions