I'm new to C and just finished K&R. This is a code that I've written to parse headers of a HTTP request. I want to know if my code:
- Is safe? (after all this is C)
- Does proper exit on failure? Are the ways that
exitLog
andexitNullPointer
functions exit considered good practice? - Has good implementation structure? Is returning a pointer to a struct and having a helper function to free the allocated memory common? Is having NULL in the header name to indicate end of headers a bad practice?
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#define DEFAULT_HEADER_COUNT 30
#define MAX_HEADER_COUNT 1000
#define DEFAULT_HEADER_FIELD_SIZE 1000
#define MAX_HEADER_FIELD_SIZE 20e6 // 20MB
void exitLog(char *log)
{
printf("\n%s\n", log);
exit(1);
}
void exitNullPointer(void *ptr)
{
if (ptr == NULL)
exitLog("Malloc/Realloc failed.");
}
struct header
{
char *headerName;
char *headerValue;
};
typedef struct header Header;
Header *createHeaderParser(char *buff)
{
/*
* NOTE: Caller must call freeHeaderParser after they
* are done with it.
*
* Gets the buffer from client, passes the first line
* and parse all the `name: value\r\n` format into the
* type Header and returns the pointer to first one.
*/
int headerCount = DEFAULT_HEADER_COUNT;
Header *headers = malloc(headerCount * sizeof(Header));
exitNullPointer(headers);
int i = 0;
while (buff[i++] != '\n')
; // to pass first line
int currHeaderPos = 0;
while (1)
{
if (currHeaderPos >= headerCount)
{
headers = realloc(headers, sizeof(Header) * (headerCount + DEFAULT_HEADER_COUNT));
exitNullPointer(headers);
headerCount += DEFAULT_HEADER_COUNT;
if (headerCount > MAX_HEADER_COUNT)
exitLog("MAX_HEADER_COUNT exceed");
}
char **currHeaderName = &headers[currHeaderPos].headerName; // Just for convenience.
char **currHeaderValue = &headers[currHeaderPos].headerValue;
long int nameSize = DEFAULT_HEADER_FIELD_SIZE;
long int valueSize = DEFAULT_HEADER_FIELD_SIZE;
*currHeaderName = malloc(nameSize);
*currHeaderValue = malloc(valueSize);
exitNullPointer(*currHeaderName);
exitNullPointer(*currHeaderValue);
int pos = 0;
while (buff[i] != ':')
{
if (nameSize >= MAX_HEADER_FIELD_SIZE)
exitLog("MAX_HEADER_FIELD_SIZE exceed");
if (pos >= nameSize)
{
nameSize *= 2; // double the size of field
*currHeaderName = realloc(*currHeaderName, nameSize + 1);
exitNullPointer(*currHeaderName);
}
(*currHeaderName)[pos++] = buff[i++];
}
(*currHeaderName)[pos] = '0円';
i += 2; // pass space and :
pos = 0;
while (buff[i] != '\r')
{
if (valueSize >= MAX_HEADER_FIELD_SIZE)
exitLog("MAX_HEADER_FIELD_SIZE exceed");
if (pos >= valueSize)
{
valueSize *= 2; // double the size of field
*currHeaderValue = realloc(*currHeaderValue, valueSize + 1);
exitNullPointer(*currHeaderValue);
}
(*currHeaderValue)[pos++] = buff[i++];
}
(*currHeaderValue)[pos] = '0円';
printf("%s,%s\n", *currHeaderName, *currHeaderValue); // to check if it has correctly parsed - won't be in final code.
currHeaderPos++;
i += 2; // pass \r\n
if (buff[i] == '\r')
{
headers[currHeaderPos].headerName = NULL; // end
break;
}
}
return headers;
}
void freeHeaderParser(Header *headers)
{
for (int i = 0; headers[i].headerName != NULL; i++)
{
free(headers[i].headerName);
free(headers[i].headerValue);
}
free(headers);
}
2 Answers 2
Answers to your questions
Is safe? (after all this is C)
No. Consider the very first while
-loop in your code:
while (buff[i++] != '\n')
; // to pass first line`
What if there is no '\n'
in the buffer? Make as little assumptions as possible about the input. Since you pass a string via a char *
, the only thing you can assume is that it will be NUL-terminated, as there is no other way to deduce the length of the string.
Similarly, things like i += 2; // pass space and :
are unsafe; don't assume there really is a space after the colon.
Also, "safe" is not very well defined. Is it safe that your function calls exit()
on error? It will take down the whole program with it, which might not be desirable.
Does proper exit on failure? Are the ways that
exitLog
andexitNullPointer
functions exit considered good practice?
Make sure you print errors to stderr
, as stdout
might already be used for other things, and it is less likely that stderr
is redirected.
Prefer exit(EXIT_FAILURE)
over a hardcoded number.
Has good implementation structure? Is returning a pointer to a struct and having a helper function to free the allocated memory common?
This is common. If you do have one function create memory, then it is very good practice to have another function for freeing that memory.
A completely different option would be to just return one header at a time, and keep track of where you are in buff
. That way, your code would not have to deal with memory allocation at all.
Is having NULL in the header name to indicate end of headers a bad practice?
That's one way of doing it. Another option would be to return the number of elements in the array somehow.
Use strdup()
and strndup()
Use strdup()
and strndup()
to copy strings instead of writing your own allocation routines. They are standard in c23, but even way before that it was part of almost all standard libraries.
Use const
where appropriate
Make parameters const
where possible. This will prevent errors where you accidentally write to input parameters, and it will sometimes also allow the compiler to better optimize code.
Reduce the amount of memory allocations
I already mentioned that you could probably solve this issue without any form of memory allocation. If you still want to parse all headers in one go, then consider that almost all of buff
is going to be header names and header values, and just a tiny bit of whitespace and colons in between. So instead of copying each element into a separately allocated string, why not just duplicate buff
, replace colons and the end of values with NUL bytes in that duplicate, and then just return an array pointers into that duplicate buffer?
Consider other ways to solve your actual problem
Do you really need a list of all the headers? If you are only want to search for a few specific headers, like Content-Length
for example, then maybe you could make a function that search for a specific header and returns just the value?
char *findHeader(const char *buff, const char *header_name)
{
...
}
Use empty lines to separate functions and code blocks
You don't have a single empty line in your program. That makes it harder to read. Try adding an empty line between every function, and also add empty lines inside functions to help separate different sections visually.
-
1\$\begingroup\$ "then just return an array [of] pointers into that duplicate buffer?" Tricky... Allocated duplicate buffer AND 'array' of ptr-pairs to be returned from one function? (And, possible error condition if not terminating.) Getting to be a bit complicated for a beginner... \$\endgroup\$Fe2O3– Fe2O32025年07月17日 20:05:23 +00:00Commented Jul 17 at 20:05
-
1\$\begingroup\$ Don't use
strdup()
for pre-C23 code (POSIX dependency). But more importantly for HTTP parsing: strdup() wastes memory copying entire lines. Instead, usestrchr
to find boundaries, thenmalloc
andmemcpy
only the exact parts you need. Saves significant memory on large requests. \$\endgroup\$Łukasz D. Tulikowski– Łukasz D. Tulikowski2025年07月18日 05:19:26 +00:00Commented Jul 18 at 5:19 -
1\$\begingroup\$ Thanks a lot for going in detail answering all my questions! One thing I really noticed from your review and others is my lack of experience working with standard string functions. I really need to use them more. I completely agree on use of empty lines, const and stderr. But about replacing : and \r\n with NUL bytes, I'm truly mind-blown with this idea, but Isn't a bit 'unsafe' to have multiple 0円 in a string? If it is not I think that is a brilliant way to parse the buffer. Again, I appreciate the time you've given to review my code. \$\endgroup\$Mehan Alavi– Mehan Alavi2025年07月18日 17:43:01 +00:00Commented Jul 18 at 17:43
-
1\$\begingroup\$ @MehanAlavi When you have multiple
0円
s, you have to see it as multiple strings in a single buffer. I'm not sure it is brilliant, but in C it definitely is expedient to do it that way. If you just finished K&R then there still is a lot of things to learn about C. \$\endgroup\$G. Sliepen– G. Sliepen2025年07月19日 08:00:45 +00:00Commented Jul 19 at 8:00 -
2\$\begingroup\$ @MehanAlavi: Using
0円
as a separator instead of newline is normal in modern Unix shell programs, likefind -print0
orgrip -l --null
pipe intoxargs -0
orsort -0
to sort filenames or other records that might contain newlines. (It's usually a bad idea to have a newline in filenames, but it is allowed and can happen by accident with copy/pasting into save-as dialog boxes, or can happen maliciously by users trying to break an admin's scripts.) Packing multiple strings head-to-tail with just implicit separators into achar[]
buffer is an option vs. ptrs to separate allocations. \$\endgroup\$Peter Cordes– Peter Cordes2025年07月21日 04:20:48 +00:00Commented Jul 21 at 4:20
Strongly suggest you spend some time reading code before attempting to write your own.
I don't want to discourage you, but this code does not demonstrate a firm grasp of how data is processed efficiently.
Exercise: Kudos to you for correctly using "pointer to pointer" variables.
Now, revise the code to NOT use that extra level of indirection.
(Hint: the names in headers[currHeaderPos].headerValue
are all pretty redundant and verbose. Try to use meaningful, but shorter, names for variables, especially those only visible within a function.)
Eg. hdrs[hdrPos].value[pos++] = buff[i++];
maybe??
Code must be correct, but should be lean and expressive; it's not "War and Peace".
Extra variables are rarely beneficial, and frequently homes for bugs.
Reinventing wheels
Hint: strtok( buff, ":\r\n" );
will chop the original string into the segments that you want without a lot of fuss. Store the returned pointers as 'name/value' pairs of pointers (odd/even using only one loop). Leave the strings in buff
, or use pointer arithmetic to determine how much to allocate for each segment (and reassign the pointers to each piece...)
It's REALLY hard to see that every name
or value
allocation will be between 1000
(minimum!) and 20Mb
in size... This, to me, seems outlandish...
Edit:
Other answers warn about strtok()
and the possibility of const
buffers.
One should always be aware of potential pitfalls of dealing with input.
FWIW: a very useful Standard Library function that is, too often, overlooked is strcspn()
(and its sibling strspn()
). You may wish to familiarise yourself with this pair and, perhaps, utilise one of them in your code.
Code is always better when it incorporates standard functions, rather than DIY constructs. Those library functions do what they're meant to do and are often faster and usually more robust than DIY code.
There's very little 'protection' against corrupt data, and every expectation of high quality data. The detection of reaching the end of the buffer to stop processing is extremely dodgy.
if (currHeaderPos >= headerCount)
{
headers = realloc(headers, sizeof(Header) * (headerCount + DEFAULT_HEADER_COUNT));
exitNullPointer(headers);
headerCount += DEFAULT_HEADER_COUNT;
if (headerCount > MAX_HEADER_COUNT)
exitLog("MAX_HEADER_COUNT exceed");
}
Why error exit AFTER the reallocation succeeded?
It appears you've not really worked out just how much/many items you want to peg as the arbitrary maximum your function will deal with. Lots of arbitrary values used, instead of measuring and adapting to what's been provided in the source data.
Hint: One pass over buff
counting :
characters would tell you how many name/value pairs to expect... No realloc()
required...
Error messages go to stderr
, not stdout
.
currHeaderPos++;
i += 2; // pass \r\n
if (buff[i] == '\r')
{
headers[currHeaderPos].headerName = NULL; // end
break;
}
Think VERY carefully about these lines...
The code allocated room for 30 name/value pairs. Say there were exactly 30 name/value pairs in the buff
string... Where will that NULL
be placed to mark the end of the used pointers?
The same "filled to capacity, but tried to sneak in a '0円'
at the end" applies to both the string copying regions of this code.
Exercise: Revise the outer loop to use do/while()
instead of while(1)
.
What almost trivial changes are needed?
Does this simplify the logic a tiny bit?
DRY: Don't Repeat Yourself
There are two large-ish while()
loops in the middle of the big function.
These lines should have been factored out as a single 'worker function' that allocates and fills the buffer until the 'magic' marking character is reached.
Strive to avoid duplication of functionality with copy/paste/adapt coding. Write capable functions that work from a few parameters. Don't repeat yourself.
Reflecting on those pointer-to-pointer variables, using a well considered function and its parameters, there would be less code and the 'contained' code that remains could use much simpler indirections.
KISS means "Keep It Simple Simon" where 'Simon' is usually another word...
You're right. This is the C language, meaning the coder can't just be slack and relaxed believing that the language has lots of guardrails and padded corners that protect sloppy coders from themselves. Attention and discipline is a prerequisite.
Peter Parker's uncle Ben said it: "With great power comes great responsibility."
Consider:
void *allocOrDie( void *ptr, size_t size )
{
ptr = realloc( ptr, size );
if( ptr )
return ptr;
fprintf( stderr, "Malloc/Realloc failed.\n" );
exit( EXIT_FAILURE );
}
Header *createHeaderParser( char *buff )
{
...
Header *headers = allocOrDie( NULL, sizeof *headers * headerCount );
...
if( currHeaderPos >= headerCount )
{
headerCount += DEFAULT_HEADER_COUNT;
if( headerCount > MAX_HEADER_COUNT )
exitLog("MAX_HEADER_COUNT exceed");
headers = allocOrDie( headers, sizeof *headers * headerCount );
...
The above is quite crude and not optimal. Intended to show the value of READING over WRITING for beginners. In the documentation you'll find that realloc()
passed a NULL
pointer acts just like malloc()
. Yes, it's fun to code some running programs as exercises. When starting out, it's more advantageous to study examples to see how experienced people have solved different issues.
As said, shown here is a crude partial transition toward less code. Plan it out and think it through... Delay allocation until AFTER the code has determined there's something it needs to store in heap. Done well, instead of malloc()
plus realloc()
, only ONE call to realloc()
is required (inside the loop) that grows from a start of 0
elements.
And don't overlook the requirement for the extra "end of list" NULL
marker.
Exercise: Write allocOrBounce()
that does not terminate, but somehow signals the caller that allocation failed and the caller can try to recover somehow...
To ponder...
Thunderbird email client stores emails as plain text, then (re-)generates and uses a 'companion' file of its own indexing for speed. (It's a simple, flexible form of a database full of 'blobs'.) I grew tired of waiting for backups when I could see that a one word email (e.g. a reply payload of "Okay!") was prefaced by 5-20Kb of 'transport layer logging statements', so I wrote a filter that thinned those down to what was useful to Thunderbird and to me.
This HTTP problem feels similar.
Just wanted to note that
char *endOfHdr = strstr( blob, "\r\n\r\n" );
gave me the 'pivot point' between what was header (preceding) and what was payload (following). I could then go on to deal with both portions as two completely separate sections.
Advice to OP: Find and read the man pages for all of the functions available in C's Standard Library. Some you'll use frequently, some only on rare occasions. Re-visit those man pages every few months to refresh your memory of what's tried-and-true and available.
One of the reasons C has survived for over 50 years is that, with its library functions, it lives at the Goldilocks point on the spectrum of languages; not too much like a 'kitchen sink' and not too sparse or barren or wanting. IOW "just right".
-
1\$\begingroup\$ Thanks for your honest feedback, Really appreciated. Now I think one of the biggest problems in my code was not using standard string functions. strcspn seems really useful not only in this code but in general. I'm absolutely shocked how I didn't noticed the header NULL and '0円' problem in case the capacity is full before your review. Just one question: Do you mean doing realloc one by one after checking if next exists? I know as you said we can count the number of colons but wanted to check what you meant from that. \$\endgroup\$Mehan Alavi– Mehan Alavi2025年07月18日 18:14:56 +00:00Commented Jul 18 at 18:14
-
1\$\begingroup\$ @MehanAlavi "batch" vs "one by one"... I skimmed a Wikipedia page about HTTP where I saw that Apache limits the name/value pairs to 100. There may be a maximum character count for the names and values... There are always many different ways to do things. Eventually, speed will be a requirement (fast-fast), but for now you're experimenting and learning... Try different approaches... Enjoy learning. Eventually it will all be Assembler for speed!! \$\endgroup\$Fe2O3– Fe2O32025年07月18日 21:13:43 +00:00Commented Jul 18 at 21:13
-
1\$\begingroup\$ @MehanAlavi PS: my "colon count" is wrong, wrong... Wikipedia page showed "value" payload could be datetime ("25 Dec 2025 02:24:18") with embedded colons... It's good to be wrong. Keeps us humble... Enjoy the journey!! :-) \$\endgroup\$Fe2O3– Fe2O32025年07月18日 21:16:46 +00:00Commented Jul 18 at 21:16
-
\$\begingroup\$ @MehanAlavi Y'know... In some cases, the UNSORTED collection of name/value pairs might be just as convenient if collected into a linked-list data model... Depends on what's to be done with the 'units' downstream... There's always another approach, and there's always some kind of trade-off involved... You do your best and cross your fingers... :-) \$\endgroup\$Fe2O3– Fe2O32025年07月18日 21:40:32 +00:00Commented Jul 18 at 21:40
-
1\$\begingroup\$ Thank you for your valuable insights. And I can't hide that how much I'm disappointed after understanding the colon counting won't work :) \$\endgroup\$Mehan Alavi– Mehan Alavi2025年07月22日 09:42:26 +00:00Commented Jul 22 at 9:42
realloc
into the same variable, you have lost the original pointer on failure and leaked memory. This is, probably, OK when the action is "exit if NULL", your OS will reclaim the memory, but still not the cleanest approach - better use another variable, check it for NULL (andfree
the old pointer if NULL), then set the variable back. \$\endgroup\$