This is my first C program and I'm just looking for some constructive criticism before I start trying to do more with it. Am in quite unfamiliar territory here and am uncertain if I'm releasing all my resources, handling errors, and whatnot "properly".
#include <errno.h>
#include <share.h>
#include <stdbool.h>
#include <stdio.h>
#define BUFFER_SIZE 4096
#define DATA_FILE_NAME "X"
static inline bool fillBuffer(FILE *, char [], size_t *, size_t *);
static inline bool nextCharEquals(FILE *, char [], size_t *, size_t *, char);
int main(void) {
char buffer[BUFFER_SIZE];
FILE *dataFile = NULL;
size_t numBytesRead = 0;
size_t numBytesUsed = 0;
size_t statusCode = 0;
dataFile = _fsopen(DATA_FILE_NAME, "rbS", _SH_DENYWR);
if (!dataFile) { goto error; }
while ((numBytesUsed < numBytesRead) || fillBuffer(dataFile, buffer, &numBytesRead, &numBytesUsed)) {
char c = buffer[numBytesUsed++];
if (('\n' == c) || ('\r' == c)) {
if ((c == '\r') && nextCharEquals(dataFile, buffer, &numBytesRead, &numBytesUsed, '\n')) {
numBytesUsed++;
}
continue; // TODO: something useful...
}
}
error:
if (errno) {
strerror_s(buffer, BUFFER_SIZE, errno);
fprintf(stderr, "%s\n", buffer);
statusCode = -1;
}
exit:
if (dataFile) {
fclose(dataFile);
}
return statusCode;
}
static inline bool fillBuffer(FILE *file, char buffer[], size_t *numBytesRead, size_t *numBytesUsed) {
*numBytesRead = fread(buffer, 1, BUFFER_SIZE, file);
*numBytesUsed = 0;
return (0 < *numBytesRead);
}
static inline bool nextCharEquals(FILE *file, char buffer[], size_t *numBytesRead, size_t *numBytesUsed, char value) {
return (((*numBytesUsed < *numBytesRead) || fillBuffer(file, buffer, numBytesRead, numBytesUsed)) && (buffer[*numBytesUsed] == value));
}
-
\$\begingroup\$ It's generally helpful to have at least some description of what the code is intended to do. \$\endgroup\$Jerry Coffin– Jerry Coffin2018年08月10日 16:11:08 +00:00Commented Aug 10, 2018 at 16:11
-
\$\begingroup\$ @JerryCoffin It reads lines from a file sequentially without actually doing anything else; I'll eventually add in some parsing logic but for now that seemed like just noise. \$\endgroup\$Kittoes0124– Kittoes01242018年08月10日 16:24:50 +00:00Commented Aug 10, 2018 at 16:24
1 Answer 1
static inline bool fillBuffer(FILE *, char [], size_t *, size_t *);
It'd be nice to provide names for these function parameters, especially since two of them have the same type (size_t *
).
It's fairly unusual to see a pointer parameter declared with []
instead of *
— I infer your rationale, something like "This pointer is expected to point to an array of contiguous objects, not just one object," but the inconsistency (and risk of some maintainer forgetting and writing sizeof buffer
just once) is too costly IMHO. The fact that char *buffer
is immediately followed by size_t bufferlen
should be a dead giveaway that buffer
points to an array.
...Oh, what do you mean char *buffer
isn't followed by size_t bufferlen
?!
static inline bool fillBuffer(FILE *file, char buffer[], size_t *numBytesRead, size_t *numBytesUsed) {
*numBytesRead = fread(buffer, 1, BUFFER_SIZE, file);
Yep, that's a buffer overflow waiting to happen. Your fillBuffer
function takes a pointer (buffer
) and unconditionally writes up to BUFFER_SIZE
(4096) bytes into it. If the caller passes in a smaller buffer... boom.
So what we want to see here is more like the API to fread
itself:
static inline bool fillBuffer(FILE *file, char *buffer, size_t bufferlen, size_t *numBytesRead, size_t *numBytesUsed) {
*numBytesRead = fread(buffer, 1, bufferlen, file);
*numBytesUsed = 0;
return (0 < *numBytesRead);
}
I know that fread
returns numBytesRead
as its return value, so I'm suspicious of the way fillBuffer
returns numBytesRead
via an out-parameter. Out-parameters are a code smell. So I look at what we're using our return value for.
return (0 < *numBytesRead);
Comparing a size_t
against 0
should generally be done with !=
, not <
, since the only value of type size_t
that is not > 0
is 0
itself.
return (0 != *numBytesRead);
But this function returns bool
, and size_t
converts to bool
in the obvious way, so:
return *numBytesRead;
Oh look! We just eliminated that out-parameter!
static inline size_t fillBuffer(FILE *fp, char *buffer, size_t bufferlen, size_t *numBytesUsed) {
size_t numBytesRead = fread(buffer, 1, bufferlen, fp);
*numBytesUsed = 0;
return numBytesRead;
}
Now simplify:
static inline size_t fillBuffer(FILE *fp, char *buffer, size_t bufferlen, size_t *numBytesUsed) {
*numBytesUsed = 0;
return fread(buffer, 1, bufferlen, fp);
}
Now pause. Notice that numBytesUsed
is not actually used by this function at all, except to unconditionally set it to 0
. The Single Responsibility Principle suggests that perhaps setting this arbitrary size_t
to 0
could be done by the caller, so as not to bother us with an essentially unrelated task. But then our whole function becomes a tail-call to fread(...)
, which seems ridiculous. Let's eliminate this function. Find its caller...
static inline bool nextCharEquals(FILE *file, char buffer[], size_t *numBytesRead, size_t *numBytesUsed, char value) {
return (((*numBytesUsed < *numBytesRead) || fillBuffer(file, buffer, numBytesRead, numBytesUsed)) && (buffer[*numBytesUsed] == value));
}
Oh dear me. Let's rewrite this mess using if
and else
:
static inline bool nextCharEquals(FILE *file, char buffer[], size_t *numBytesRead, size_t *numBytesUsed, char value) {
if (*numBytesUsed >= *numBytesRead) {
if (fillBuffer(file, buffer, numBytesRead, numBytesUsed) == 0) {
return false;
}
}
return buffer[*numBytesUsed] == value;
}
Okay, now for the inlining step!
static inline bool nextCharEquals(FILE *fp, char *buffer, size_t bufferlen, size_t *numBytesUsed, char value) {
if (*numBytesUsed >= *numBytesRead) {
*numBytesRead = fread(buffer, 1, bufferlen, fp);
*numBytesUsed = 0;
if (*numBytesRead == 0) {
return false;
}
}
return buffer[*numBytesUsed] == value;
}
Now we can start to make out where the real problems lie. For example, it is tempting to observe that if (*numBytesRead == 0)
is equivalent to if (*numBytesUsed >= *numBytesRead)
(since we just set *numBytesUsed
to 0
on the previous line). So we are tempted to rewrite this function as
static inline bool nextCharEquals(FILE *fp, char *buffer, size_t bufferlen, size_t *numBytesUsed, char value) {
while (*numBytesUsed >= *numBytesRead) {
*numBytesRead = fread(buffer, 1, bufferlen, fp);
*numBytesUsed = 0;
// TODO: check errno? check feof(fp)?
}
return buffer[*numBytesUsed] == value;
}
Of course as written that would be an infinite loop on EOF; but it makes us ponder in what situations we expect to see fread(...) == 0
... which starts us thinking about error handling.
Meanwhile, we can apply the Single Responsibility Principle again to figure out that perhaps char value
doesn't belong in this function. Why would our caller want to write
nextCharEquals(dataFile, buffer, &numBytesRead, &numBytesUsed, '\n')
when they could write
nextChar(dataFile, buffer, &numBytesRead, &numBytesUsed) == '\n'
instead?
Any time you're spelling out the name of a built-in operator (==
) in your function name, it's a good sign that your function is doing more than it should be. Let the built-in operator pull its own weight!
One more random tip: Instead of hard-coding BUFFER_SIZE
all over your program, try to use sizeof buffer
.
char buffer[BUFFER_SIZE];
foo(buffer, BUFFER_SIZE); // bad, repetitive, easy to typo
char buffer[4096];
foo(buffer, sizeof buffer); // good, clean, hard to typo
It'll also train you to pass your buffer sizes around properly, since you won't have all your buffers' sizes so promiscuously available in "global variables." And that will help you prepare for the world of multi-source-file programs.
-
\$\begingroup\$ Lots of good advice in here but ya appear to have gone a bit overboard. For example, you eliminated an out variable in the name of SRP which then forces me to maintain that variable in three separate places; how is that possibly any less "smelly"? \$\endgroup\$Kittoes0124– Kittoes01242018年08月12日 14:56:59 +00:00Commented Aug 12, 2018 at 14:56
-
\$\begingroup\$ I've made alterations to the file I'm working on that implement most of what you recommended, see here: gist.github.com/Kittoes0124/606ba934caf14a50133a78ebc666ce67. \$\endgroup\$Kittoes0124– Kittoes01242018年08月12日 15:58:30 +00:00Commented Aug 12, 2018 at 15:58
-
1\$\begingroup\$ @Kittoes0124: "forces me to maintain that variable in three separate places"... By my count, you're already modifying
numBytesUsed
in three places (two increments and one set-to-zero), and all I recommended was to move the set-to-zero up a couple of levels. That doesn't increase the amount of duplication. However, if you want to really clean up this code, you should look into a feature of C I didn't even mention:struct
. Wrapping those four variables (buf
,buflen
,available
andused
) into a singlestruct Buffer *
would drastically simplify a lot of your function signatures. \$\endgroup\$Quuxplusone– Quuxplusone2018年08月12日 19:30:12 +00:00Commented Aug 12, 2018 at 19:30