Bare bones base 64 encoding/decoding

Question 1

I wrote a bare-bones, no fluff, base 64 library in C. My goal was to make it as small as possible. I'm wondering if there are any further size optimizations I can make (without sacrificing too much readability).

#include <arpa/inet.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define ALPHABET "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
#define INVALID_SEXTET 64
static char
convert_sextet(unsigned char sextet)
{
 return ALPHABET[sextet];
}
static unsigned char
deconvert_sextet(char c)
{
 if (c >= 'A' && c <= 'Z') {
 return c - 'A';
 }
 else if (c >= 'a' && c <= 'z') {
 return 26 + (c - 'a');
 }
 else if (c >= '0' && c <= '9') {
 return 52 + (c - '0');
 }
 else if (c == '+') {
 return 62;
 }
 else if (c == '/') {
 return 63;
 }
 else {
 return INVALID_SEXTET;
 }
}
void
mini64_encode(const unsigned char *data, size_t len, char *dst)
{
 if (!data || len == 0 || !dst) {
 return;
 }
 for (size_t k = 0; k < len; k += 3) {
 unsigned int group_size = MIN(3, len - k);
 uint32_t group = 0;
 char word[5] = "====";
 memcpy((unsigned char *)&group + 1, data + k, group_size);
 group = ntohl(group);
 for (unsigned int j = 0; j <= group_size; j++) {
 unsigned char sextet = (group >> 18) & 0x3f;
 word[j] = convert_sextet(sextet);
 group <<= 6;
 }
 sprintf(dst, "%s", word);
 dst += 4;
 }
}
int
mini64_decode(const char *string, size_t len, unsigned char *dst, size_t *data_len)
{
 unsigned char *orig_dst = dst;
 if (!string || len == 0 || !dst || !data_len) {
 return -1;
 }
 if (len % 4 != 0) {
 return -1;
 }
 for (size_t k = 0; k < len; k += 4) {
 unsigned int group_size = 3;
 uint32_t group = 0;
 for (unsigned int j = 0; j < 4; j++) {
 char c = string[k + j];
 unsigned char sextet;
 group <<= 6;
 if (c == '=') {
 unsigned char check_mask;
 if (k != len - 4 || j < 2 || (j == 2 && string[k + 3] != '=')) {
 return -1;
 }
 if (j == 2) {
 check_mask = 0x0f;
 group <<= 6;
 }
 else {
 check_mask = 0x03;
 }
 if (deconvert_sextet(string[k + j - 1]) & check_mask) {
 return -1;
 }
 group_size = j - 1;
 break;
 }
 sextet = deconvert_sextet(c);
 if (sextet == INVALID_SEXTET) {
 return -1;
 }
 group |= sextet;
 }
 group = htonl(group);
 memcpy(dst, (unsigned char *)&group + 1, group_size);
 dst += group_size;
 }
 *data_len = dst - orig_dst;
 return 0;
};

Question 2

I'd not name the functions mini64_encode and mini64_decode. In the code the user doesn't care if this is a minimal implementation, and base64 should definitely be in there. Don't care about name clashes, if a base 64 encoding is already available then these methods should not be required (be happy to be corrected on this).

It's nice that the code should be correct independent on the systems' byte order (ntohl call).

if (!data || len == 0 || !dst) {
 return;

I'd return an error code if one of the pointers aren't set.

A function to calculate the required dst size is missing. Without it the user is prone to create buffer overruns.

One trick is to have a positive return value or a separate ref. parameter set to the actual output size. If dst is NULL then only the required output size could be returned. That way the user knows the data size (in case the buffer is larger) and they can also create the exact required size if needed (trick taken from the PKCS#11 / Cryptoki interface spec).

sprintf is a function call. Function calls should be avoided for this kind of functionality.

 char word[5] = "====";

That's only 4 characters if you would not assign it a zero-terminated string, and only up to 2 characters are ever used (a single byte will still occupy 2 base 64 characters after all). I'd just add one or two = characters programmatically.

uint32_t group = 0;
...
group = ntohl(group);

The initial assignment is unnecessary if the variable will be assigned a value anyway.

k and j don't really indicate what the variables are for. This is a localized problem though, so not very important.

For faster operation I would unroll the for loop (with j as the variable) and then perform the last operation and the padding separately. That removes a simple branch in the inner loop. So just shift by 18, 12 and 6 during encoding for instance.

One trick to speed up encoding / decoding is to perform bitwise trickery instead of using an alphabet. This is quite a lot harder to do, but implementations should exist.

I'd always reference the base 64 encoding / decoding that is performed. I would e.g. reference the precise encoding in RFC 4648.

During decoding I would definitely use different error codes for having:

an incorrectly sized input
a wrong base 64 character
possibly a = padding character at the wrong spot (but read onto the next comment)

The check for the padding character is only required at the last part of the base 64 encoding. It is not required to check for padding characters for any block but the last. Again, the loop can be unrolled and the last block can be handled separately.

Question 3

Wouldn't loop unrolling increase the binary size (though perhaps negligibly)?

Question 4

parallel construction

convert_sextet(unsigned char sextet)
...
deconvert_sextet(char c)

We have some private helpers. The first name is fine. The second, less so.

Call it convert_char?

Or maybe call the pair

to_char() and
to_sextet()

table driven

We have half a dozen if clauses, and we execute them a lot, one per input character. The Intel branch predictor doesn't have much to work with. Fortunately the "big" ranges are handled up front, so we typically get a match early-ish.

I recommend running a few loops once to populate a 256-element char_to_sextet table, and then each call is just a table lookup. It would be same as how returning ALPHABET[sextet] works.

strcpy vs memcpy

 sprintf(dst, "%s", word);

That seems an odd way to spell strcpy(dst, word).

And I guess writing N/3 NULs that immediately get overwritten isn't the end of the world.

What would be too bad is if the encoded data looked like 41 00 42, that is, "A0円B". The NUL in the middle prevents the 0x42 from being written. Are you sure you wrote automated unit tests that roundtrip arbitrary data back and forth, verifying the bits come back alive out the other side?

Question 5

You could be aiming for Brevity, Terseness, compactness, smallest code, fewest branches, fastest code, most readable, etc.. Example.

// decode: ascii to data. decode 4 chars to 3 data bytes.
sixbitint sextets[4];
size_t doffset = 0;
for(i=0; i<len; i+=4) {
 sextets[0] = ascii_to_sextet(string[i+0]);
 sextets[1] = ascii_to_sextet(string[i+1]);
 sextets[2] = ascii_to_sextet(string[i+2]);
 sextets[3] = ascii_to_sextet(string[i+3]);
 dst[doffset++] = ((sextets[0] << 2) & BITS_111111oo ) | ((sextets[1] >> 4) & BITS_oooooo11 );
 dst[doffset++] = ((sextets[1] << 4) & BITS_1111oooo ) | ((sextets[2] >> 2) & BITS_oooo1111 );
 dst[doffset++] = ((sextets[2] << 6) & BITS_11oooooo ) | ((sextets[3] ) & BITS_oo111111 );
 }
if('=' == string[i-1]) doffset--;
if('=' == string[i-2]) doffset--;
*data_len = doffset;
// encode : data to ascii
size_t writeindex = 0, readindex = 0;
for (; readindex < datalen; readindex+=3, writeindex+=4) {
 dst[writeindex+0] = sextet_to_ascii( (data[readindex+0] >> 2));
 dst[writeindex+1] = sextet_to_ascii((((data[readindex+0] << 4) & BITS_oo11oooo)|((data[readindex+1] >> 4) & BITS_oooo1111)));
 dst[writeindex+2] = sextet_to_ascii((((data[readindex+1] << 2) & BITS_oo1111oo)|((data[readindex+2] >> 6) & BITS_oooooo11)));
 dst[writeindex+3] = sextet_to_ascii( data[readindex+2]);
 }
// any partial final group
switch(datalen % 3) {
 case 1:
 dst[writeindex-3] = sextet_to_ascii(((data[readindex-3] << 4) & BITS_oo11oooo));
 dst[writeindex-2] = '='; dst[writeindex-1] = '=';
 break;
 case 2:
 dst[writeindex-2] = sextet_to_ascii(((data[readindex-2] << 2) & BITS_oo1111oo));
 dst[writeindex-1] = '=';
 break;
 }
dst[writeindex] = 0;
}

Question 6

(A good thing, probably, that this isn't answer review. The question states goal [was] as small as possible, not specifying source code, machine code+static data or what else. Seeing 6 bits shifted 4 left&right don't overlap, two shifts can be avoided at the expense of one more 256 byte table, same goes for 2 left&6 right.)

Maarten Bodewes Maarten Bodewes 6,59920 silver badges53 bronze badges · Accepted Answer · 2024-02-17 06:54:12Z

I'd not name the functions mini64_encode and mini64_decode. In the code the user doesn't care if this is a minimal implementation, and base64 should definitely be in there. Don't care about name clashes, if a base 64 encoding is already available then these methods should not be required (be happy to be corrected on this).

It's nice that the code should be correct independent on the systems' byte order (ntohl call).

if (!data || len == 0 || !dst) {
 return;

I'd return an error code if one of the pointers aren't set.

A function to calculate the required dst size is missing. Without it the user is prone to create buffer overruns.

One trick is to have a positive return value or a separate ref. parameter set to the actual output size. If dst is NULL then only the required output size could be returned. That way the user knows the data size (in case the buffer is larger) and they can also create the exact required size if needed (trick taken from the PKCS#11 / Cryptoki interface spec).

sprintf is a function call. Function calls should be avoided for this kind of functionality.

 char word[5] = "====";

That's only 4 characters if you would not assign it a zero-terminated string, and only up to 2 characters are ever used (a single byte will still occupy 2 base 64 characters after all). I'd just add one or two = characters programmatically.

uint32_t group = 0;
...
group = ntohl(group);

The initial assignment is unnecessary if the variable will be assigned a value anyway.

k and j don't really indicate what the variables are for. This is a localized problem though, so not very important.

For faster operation I would unroll the for loop (with j as the variable) and then perform the last operation and the padding separately. That removes a simple branch in the inner loop. So just shift by 18, 12 and 6 during encoding for instance.

One trick to speed up encoding / decoding is to perform bitwise trickery instead of using an alphabet. This is quite a lot harder to do, but implementations should exist.

I'd always reference the base 64 encoding / decoding that is performed. I would e.g. reference the precise encoding in RFC 4648.

During decoding I would definitely use different error codes for having:

an incorrectly sized input
a wrong base 64 character
possibly a = padding character at the wrong spot (but read onto the next comment)

The check for the padding character is only required at the last part of the base 64 encoding. It is not required to check for padding characters for any block but the last. Again, the loop can be unrolled and the last block can be handled separately.

Wouldn't loop unrolling increase the binary size (though perhaps negligibly)?

Stack Exchange Network

Bare bones base 64 encoding/decoding

3 Answers 3

parallel construction

table driven

strcpy vs memcpy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bare bones base 64 encoding/decoding

3 Answers 3

parallel construction

table driven

strcpy vs memcpy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions