Note: This is the first time I've written anything whatsoever in straight C. In other words: I have no idea what I'm doing.
I recently had a task that involved temporarily relaying responses from one server, through a web-facing server, and then on to the client. Along the way, the relay server had to dynamically add an HTTP header of its own to the relayed response. Came up with a quick and dirty scripted solution, but it worked for the duration it had to.
Still, I thought it might be a good exercise for getting to know a little C, just for fun, since something like this would benefit from being written closer to the metal.
So below is a program (I call it headshape
for lack of a better name) where you pipe in (say, from a curl command) an HTTP response including headers, and the program will add/remove/modify one of those headers, and pass on the rest on as-is. Given no arguments, stdin will just pass to stdout.
E.g.
$ curl -i example.com | headshape # pass through
$ curl -i example.com | headshape Server # remove the Server header
$ curl -i example.com | headshape Via 'foobar' # add/modify the Via header
So it does a bit of HTTP parsing to find the right header - or add it if it's not there - and to figure out what the response code is. Only responses with a 2xx status will be altered, 1xx responses will be ignored, and 3xx and above will cause it to just forward all remaining data unaltered (this partly an arbitrary choice; it's just a code exercise after all). It also defaults to forwarding everything as-is if it sees something it can't make sense of.
Below is the code. Much to my surprise it seems to work exactly as intended but, again, I've never written C code, so it's no doubt horrible. Any and all tips are welcome.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define CRLF "\r\n" // HTTP line ending
#define BLOCK_SIZE 1024 // Any reason to make this larger/smaller?
#define HEADER_DELIM ": " // Used to spot header lines
#define MAX_HEADER_LENGTH 100 // max length of a header *name*; not the whole header line
static int parseStatus(char* line);
static char* getHeaderName(char* line);
static int isDelimited(char* line, int continuation);
static int isBoundary(char* line);
void writeLine(char* line);
void passthru();
int main(int argc, char const *argv[]) {
// states
int should_parse = 0;
int effective_status = 0;
// options/
char* target_header = NULL;
char* payload_line = NULL;
// loop variables
int line_continued = 0;
int current_status = 0;
char* current_header = NULL;
char* line_out = NULL;
char line[BLOCK_SIZE];
// exit with >0 if nothing's being piped
if(isatty(fileno(stdin))) {
return 1; // TODO: is there a better code for this?
}
// Get the target header argument, if any
if(argc > 1) {
should_parse = 1; // could just check target_header, but this seems more descriptive
target_header = (char*) argv[1];
}
// Get the target value argument, if any
if(argc > 2) {
// build the payload header line (i.e. "Header: value")
size_t payload_length = 1;
payload_length += strlen(target_header);
payload_length += strlen(HEADER_DELIM);
payload_length += strlen(argv[2]);
payload_length += strlen(CRLF);
payload_line = (char *) malloc(payload_length);
strcpy(payload_line, target_header);
strcat(payload_line, HEADER_DELIM);
strcat(payload_line, argv[2]);
strcat(payload_line, CRLF);
}
// heading parsing loop
while(should_parse && fgets(line, BLOCK_SIZE, stdin)) {
// if previous fgets didn't get a complete, CRLF-delimited line
// (i.e. a line exceeded BLOCK_SIZE), assume the current chunk
// is the continuation, and output it without parsing
if(line_continued) {
writeLine(line_out);
line_continued = !isDelimited(line, line_continued);
continue;
}
// check if this is a "complete" CRLF-delimited line
line_continued = !isDelimited(line, line_continued);
// reference our input for later
line_out = line;
if((current_status = parseStatus(line))) {
// set the status code
effective_status = current_status;
// we're not interested in rewriting 3xx and higher responses
if(effective_status >= 300) {
// pass the remaining data directly to stdout
should_parse = 0;
}
} else if((current_header = getHeaderName(line))) {
// check if this is the header we're looking for
if(strcmp(current_header, target_header) == 0) {
// replace or remove the header
line_out = payload_line;
// pass the remaining data directly to stdout
should_parse = 0;
}
} else if(isBoundary(line)) {
// We've reached the end of the HTTP header section.
// If effective_status is 1xx, parsing continues but
// 2xx will stop the parsing and insert the header
// if need be
if(effective_status >= 200) {
// add the header, if this is a 2xx response
if(payload_line && effective_status < 300) {
writeLine(payload_line);
}
// pass the remaining data directly to stdout
should_parse = 0;
}
} else {
// stop parsing if the line isn't recognized and
// pass the remaining data directly to stdout
should_parse = 0;
}
// output the line (unless it's NULL)
if(line_out) writeLine(line_out);
}
// pass the remaining data directly to stdout
passthru();
// free the payload line, if necessary
if(payload_line) free(payload_line);
return 0;
}
// Gets the header name in the line, if any.
static char* getHeaderName(char* line) {
static char header[MAX_HEADER_LENGTH];
char* delim = strstr(line, HEADER_DELIM);
if(delim) {
size_t count = delim - line;
count = count >= MAX_HEADER_LENGTH ? MAX_HEADER_LENGTH - 1 : count;
strncpy(header, line, count);
header[count] = '0円';
return header;
}
return NULL;
}
// Match a line like "HTTP/1.x xxx..." and extract the status code
// Not exactly regex-like precision here, though. Maybe use strtok()?
static int parseStatus(char* line) {
size_t code_length = 4;
char* http = strstr(line, "HTTP/1.");
char* delim = strchr(line, ' ');
size_t remaining = strlen(line) - (delim - line) + 1;
if(http && delim && remaining > code_length) {
char code[remaining];
strncpy(code, delim, remaining);
return atoi(code);
}
return 0;
}
// Check whether a string ends with CRLF
// Note: This function keeps the last char of the line from its previous
// invocation in memory in order to spot the 2-character CRLF even if it's
// been split into separate fgets lines.
static int isDelimited(char* line, int continuation) {
static char prev = 0;
char tail[3] = {'0円', '0円', '0円'};
int offset = strlen(line) - 2;
if(offset >= 0) {
// grab last 2 chars
tail[0] = line[offset];
tail[1] = line[offset + 1];
} else if(continuation && offset == -1) {
// grab char carried over from earlier, and the line's single char
tail[0] = prev;
tail[1] = line[0];
}
prev = tail[1];
return !strcmp(tail, CRLF);
}
// is this line a "blank" CRLF line?
static int isBoundary(char* line) {
return strlen(line) == strlen(CRLF) && strcmp(line, CRLF) == 0;
}
// write to stdout & flush
void writeLine(char* line) {
fwrite(line, sizeof(char), strlen(line), stdout);
fflush(stdout);
}
// pass stdin through to stdout
void passthru() {
char buffer[BLOCK_SIZE];
while(1) {
size_t bytes = fread(buffer, sizeof(char), BLOCK_SIZE, stdin);
fwrite(buffer, sizeof(char), bytes, stdout);
fflush(stdout);
if(bytes < BLOCK_SIZE && feof(stdin)) break;
}
}
1 Answer 1
A few items:
Use STDIN_FILENO
Instead of fileno(stdin)
you can use the preprocessor symbol STDIN_FILENO
, defined in <unistd.h>
which you have already included.
should_parse
should be declared bool
Assuming you're using a compiler that isn't decades old (bool
was added in the c99 standard), you can include <stdbool.h>
and use bool
for the type of should_parse
. It also allows you to use the values of true
and false
within the code to make the intention of that flag a little more explicit.
The same is true for line_continued
, the return value of isBoundary
, and isDelimited
and the second argument of isDelimited
.
target_header
should be const
and argv
should not
Your target_header
variable should be declared const
because its contents are not altered by the program. If you make that change, you can also remove the cast from the line
target_header = (const char *)argv[1];
However, you wouldn't actually need that cast anyway if you had declared main
as:
int main(int argc, char *argv[])
It does seem like that ought to be const *argv
but that's counter to what the C standard says. In section 5.1.2.2.1, it says that:
the strings pointed to by the
argv
array shall be modifiable by the program
which implies that they're not const
.
Don't abuse fflush
You probably don't need to call fflush
after every fwrite
. The stream will automatically flush when the file is closed, which also happens automatically when main
ends.
Make loop exits explicit
In passthru()
, instead of using while(1)
and then using a break
it would be more clear to write it like this:
size_t bytes;
do {
bytes = fread(buffer, sizeof(char), BLOCK_SIZE, stdin);
fwrite(buffer, sizeof(char), bytes, stdout);
} while (bytes == BLOCK_SIZE && !feof(stdin));
Simplify isBoundary
The isBoundary
routine can be simplified as this:
// is this line a "blank" CRLF line?
static bool isBoundary(char* line) {
return strcmp(line, CRLF) == 0;
}
There is no need to also compare their lengths, since that's already implicit in strcmp
.
Calculation of payload_line
is complex
The payload_line
variable size is calculated, and then malloc
'd and then copied, but it's only used a few other places. What you've got isn't wrong, but it might be worthwhile instead allocating a fixed size and then using snprintf
to populate the string. That would collapse the dozen lines used for that calculation to the much simpler single line:
snprintf(payload_line, BLOCK_SIZE, "%s" HEADER_DELIM "%s" CRLF, target_header, argv[2]);
I'm sure there's more but I lack the time at the moment.
Update
I found some time and a few other things in the code.
Refactor parseStatus
The current parseStatus
routine does more work than needed to extract the status code. You don't need to copy the string to pass it to atoi
so it can be greatly simplified like so:
static int parseStatus(char* line) {
char* http = strstr(line, "HTTP/1.");
if (http) {
http = strchr(http, ' ');
if (http) {
return atoi(http);
}
}
return 0;
}
Refactor getHeaderName
Right now, the getHeaderName
function is outlined something like this:
static char* getHeaderName(char* line) {
static char header[MAX_HEADER_LENGTH];
/* ... */
return header;
}
This works (and even some ancient library functions were done this way) but it's not thread-safe and the design can easily be improved. Instead, it's better to have the calling function allocate a buffer and pass it in, along with the length rather than pointing to an internal static buffer.
Refactor isDelimited
In a similar vein, your isDelimited
function is really only checking for two particular bytes, so it would be more clear and slightly faster to check for them directly instead of constructing a copy in memory and then using strcmp
.
-
\$\begingroup\$ Great tips, thanks! Re
bool
: was trying some things out using ints, and it stuck; will definitely fix that.argv
: I used my editor's C template and didn't ask questions. Good to know theconst
isn't required.fflush
: I smelled that myself, but I also figured I'd have many Mbits/s streaming through, so maybe better to get stuff flushed asap? Loop exit: Guilty of googling and copy/pasting without thinking!isBoundary
: Ah, of course! Should've seen that myself.payload_line
: I could smell that code from a mile away, but I didn't know a better way to do it. Now I do - thanks! \$\endgroup\$Flambino– Flambino2014年05月04日 23:55:48 +00:00Commented May 4, 2014 at 23:55 -
1\$\begingroup\$ It's a pretty good attempt for a first foray into C. You did well, despite all the nit picking! \$\endgroup\$Edward– Edward2014年05月04日 23:58:06 +00:00Commented May 4, 2014 at 23:58
-
1\$\begingroup\$ Oh, and in my defense, using
malloc
andfree
made me feel like I was really writing C code after all these years of languages with GC ;) \$\endgroup\$Flambino– Flambino2014年05月04日 23:58:39 +00:00Commented May 4, 2014 at 23:58 -
\$\begingroup\$ Thanks! And hey, nit-picking is what I want in a review! I mean, it'd be pretty cool if my very first stab at C was a gleaming example of perfection, but let's just say I didn't expect that to be the case :) \$\endgroup\$Flambino– Flambino2014年05月05日 00:00:37 +00:00Commented May 5, 2014 at 0:00
-
\$\begingroup\$ Just noticed your update - thanks again! Good point about the construction/signature of
getHeaderName
in particular. I definitely see your point (and I recognize the pass-in-a-buffer approach, now that you mention it). Great stuff. \$\endgroup\$Flambino– Flambino2014年05月05日 15:01:21 +00:00Commented May 5, 2014 at 15:01