I have written a program that outputs the hexadecimal representation of an array of bytes. The program works just fine, and I am confident (thanks to valgrind) that I don't have any memory leaks. However, I am not so confident I am doing this the simplest, nor most readable way. It seems to me I have written a lot of code for a small task.
I am looking for general feedback, including code style, types, variable naming, usage of malloc, and simpler ways of appending a char array to a string, possibly avoiding sprintf
.
The program was written to exercise storing a hexadecimal representation of a byte array in a string, not printing it out directly using printf()
.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
const unsigned char bytearr[] = { 0x0b, 0x34, 0x33, 0x32 };
const int hexlen = 2; // outputting hex value with leading zero
const int outstrlen = sizeof(bytearr) * sizeof(char) * sizeof(bytearr[0]) * hexlen;
char *outstr;
if ((outstr = malloc(outstrlen + 1)) == NULL) {
printf("Failed to allocate memory");
exit(1);
}
bzero(outstr, outstrlen + 1);
int i;
char tmp[hexlen + 1];
for (i = 0; i < (sizeof(bytearr) / sizeof(bytearr[0])); i++) {
sprintf(tmp, "%.2x", bytearr[i]);
strcat(outstr, tmp);
}
printf("Hexadecimal string:\n%s\n", outstr);
free(outstr);
return 0;
}
For what it's worth, after this review, my program ended up like this.
2 Answers 2
Don't mix signed and unsigned
GCC warns about
192329.c:8:79: warning: conversion to ‘long unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
const int outstrlen = sizeof(bytearr) * sizeof(char) * sizeof(bytearr[0]) * hexlen;
^
192329.c:8:27: warning: conversion from ‘long unsigned int’ to ‘int’ may change value [-Wconversion]
const int outstrlen = sizeof(bytearr) * sizeof(char) * sizeof(bytearr[0]) * hexlen;
^~~~~~
192329.c:11:36: warning: conversion to ‘size_t’ {aka ‘long unsigned int’} from ‘int’ may change the sign of the result [-Wsign-conversion]
if ((outstr = malloc(outstrlen + 1)) == NULL) {
~~~~~~~~~~^~~
192329.c:19:19: warning: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Wsign-compare]
for (i = 0; i < (sizeof(bytearr) / sizeof(bytearr[0])); i++) {
^
These are easily fixed by changing outstrlen
, hexlen
and i
to unsigned types (ideally size_t
in all cases).
Prefer memset()
over bzero()
The standard library function to initialize memory is memset()
; bzero()
is a common extension, but less portable. It turns out we don't need it anyway, as we will overwrite that initialization with our output - no need to fill with zeros first (if we really do want to do that, consider using calloc()
instead of malloc()
). We just need outstr[0] = 0;
so we start with an empty string.
Size computation
sizeof (char)
is always 1, because sizeof
measures in units of char
. And there seems to be more storage computed than we need - we actually just require twice the input length plus the terminating NUL.
Error checking
It's good that you remembered to check your allocation didn't return NULL. I'd tidy that up to assign then test, like this:
char *outstr = malloc(outstrlen);
if (!outstr) {
fprintf(stderr, "Failed to allocate memory\n");
return 1;
}
Notice that the error message now goes to the standard error stream, and ends in a newline. And I prefer to return
from main()
rather than calling exit()
.
Print directly into the output
Instead of printing into a tmp
array, write the characters directly into the output buffer. We know the position for index i
is outstr + hexlen * i
, so we can simply
for (size_t i = 0; i < sizeof bytearr / sizeof bytearr[0]; ++i) {
sprintf(outstr + hexlen * i, "%.2x", bytearr[i]);
}
Note that each sprintf
overwrites the final NULL of the previous.
One technique that works is to declare a pointer for the current write position, then at each iteration add the number of characters printed:
char *p = outstr;
for (size_t i = 0; i < sizeof bytearr / sizeof bytearr[0]; ++i) {
p += sprintf(p, "%0*hhx", (int)hexlen, bytearr[i]);
}
Consider writing a function
It's good to separate the reusable part of your code from the setup and output instructions.
Modified code
#include <stdio.h>
#include <stdlib.h>
/* Returns a string representation of _length_ bytes beginning at
_array_. The caller has responsibility to release the string,
using _free()_. If the string could not be allocated, returns a
null pointer instead. */
char *to_hex_string(const unsigned char *array, size_t length)
{
char *outstr = malloc(2*length + 1);
if (!outstr) return outstr;
char *p = outstr;
for (size_t i = 0; i < length; ++i) {
p += sprintf(p, "%02hhx", array[i]);
}
return outstr;
}
int main() {
const unsigned char bytearr[] = { 0x0b, 0x34, 0x33, 0x32 };
char *outstr = to_hex_string(bytearr, sizeof bytearr);
if (!outstr) {
fprintf(stderr, "Failed to allocate memory\n");
return 1;
}
printf("Hexadecimal string:\n%s\n", outstr);
free(outstr);
}
-
\$\begingroup\$ Thank you very much for the feedback! Since we've concluded we already know the size of the byte arrays elements, is it also okay to drop
sizeof bytearr / sizeof bytearr[0]
in thefor
loop and simply usei < sizeof bytearr
? Correct me if I'm wrong as I am now interpreting the line ofchar *p = outstr;
as exactly the way to avoid being Shlemiel The Painter in thesprintf
example. \$\endgroup\$sshow– sshow2018年04月18日 11:22:07 +00:00Commented Apr 18, 2018 at 11:22 -
\$\begingroup\$ I am compiling with
gcc -fverbose-asm -Wall -g -O0 -o hexstr hexstr.c
, and I did not get any such errors about the lack ofsize_t
. Care to share some more parameters I could use? \$\endgroup\$sshow– sshow2018年04月18日 11:22:57 +00:00Commented Apr 18, 2018 at 11:22 -
\$\begingroup\$ I recommend adding
-Wextra
as a minimum. Other warnings I usually enable are-Wwrite-strings -Wpedantic -Warray-bounds -Wconversion
, but there's one I usually disable:-Wno-parentheses
\$\endgroup\$Toby Speight– Toby Speight2018年04月18日 11:37:20 +00:00Commented Apr 18, 2018 at 11:37 -
\$\begingroup\$ Yes, it makes sense to drop the division by
sizeof bytearr[0]
because that must be 1 (though if you're writing code where the type of array might change, that's a good safety feature to keep; in this case, changing the type would also necessitate a change to theprintf
and tohex_len
, so it's probably so tightly embedded that we needn't do so). And yes,char *p
is exactly the mechanism to take the paint bucket with us as the work moves. \$\endgroup\$Toby Speight– Toby Speight2018年04月18日 11:41:59 +00:00Commented Apr 18, 2018 at 11:41
Well, aside from the fact that you could just reduce the program to outputting a single string-literal, let's look at it:
Excellent code-formatting. That's far too rare.
The calculation for
outstrlen
is all kinds of bent. Both middle factors don't make any sense.sizeof(bytearr) * sizeof(char) * sizeof(bytearr[0]) * hexlen size_in_bytes * bytes_per_byte * bytes_per_byte * digits_per_byte
Just to be "safe", you zero
outstr
before overwriting it...
That isn't safe, that's redundant.There's no good reason to use
bzero()
and make your program unportable. Just usememset()
if you must.Why do you print into a temporary buffer, only to append it to the output-string?
- Print directly into the final destination. No need for an intermediate.
- If you build a string bit by bit, don't be Shlemiel The Painter. No need to make a linear algorithm quadratic.
As you immediately exit the program, freeing memory is just busy-work. Still, as I'm not sure what you are trying to learn there, it might be excused.
As an aside,
return 0;
is implicit formain()
in C99+ and C++. Not sure you that's really relevant, but here you go.
Equivalent program (C99+ or C++):
#include <stdio.h>
int main() {
printf("Hexadecimal string:\n0b343332\n");
}
-
\$\begingroup\$ Thank you very much for the feedback. My primary goal here was to save the hexadecimals to a string. Not to print them out. The
malloc()
is there because I thought I should. Thefree()
is there because themalloc()
is there. \$\endgroup\$sshow– sshow2018年04月17日 23:02:01 +00:00Commented Apr 17, 2018 at 23:02 -
\$\begingroup\$ And now I understand that the heap is normally freed upon exit by the operating system, hence making the
free()
redundant, too, in this specific example. \$\endgroup\$sshow– sshow2018年04月17日 23:18:23 +00:00Commented Apr 17, 2018 at 23:18