This is part of a two-part post (Part 1).
Here, I have two recent projects that parse a file. The first uses a loop that's kind of hard to follow, and the second uses "modes" to decide what to do.
Project 2: A working map renderer that I plan to implement the game of Picross with.
This one uses "modes" to control the flow of the loop. I feel it is much more elegant, but wonder if this is the best way to do it.
The project is nowhere near finished, but I think this is in-line with the posting guidelines, since the part I care about is fully functional.
The data structures:
struct picross_data {
char *name,
*map;
int width,
height;
};
The parser:
enum pfmode {
mode_find_name,
mode_get_name,
mode_find_width,
mode_get_width,
mode_find_height,
mode_get_height,
mode_get_map,
mode_finish,
};
void
parse_file(char *fname, struct picross_data *pd)
{
FILE *f = fopen(fname, "r");
char fbuf[10000] = "",
*fbp = fbuf,
*nbuf = calloc(sizeof(char), 100),
*nbp = nbuf,
numbuf[10] = "",
*nmbp = numbuf,
*mapbuf,
*mbp,
c;
enum pfmode m = mode_find_name;
/* This function uses four buffers for parsing the file.
* fbuf[10000] for the file contents,
* nbuf[100] for the name,
* numbuf[10] for numbers,
* mapbuf[w*h] for the map
* The modes are self-explanatory. They make it easy to focus on one section at a time without an over-complicated hacked-together while loop
*/
while((c = fgetc(f)) != EOF) *(fbp++) = c;
fclose(f);
fbp = fbuf;
while(m != mode_finish){
switch(m){
case mode_find_name:
if(*fbp == '"'){
m = mode_get_name;
}
fbp++;
break;
case mode_get_name:
if(*fbp == '"'){
m = mode_find_width;
pd->name = nbuf;
} else {
*nbp = *fbp;
nbp++;
}
fbp++;
break;
case mode_find_width:
if(*fbp == '['){
m = mode_get_width;
}
fbp++;
break;
case mode_get_width:
if(*fbp == ','){
pd->width = atoi(numbuf);
nmbp = numbuf;
m = mode_find_height;
} else {
*nmbp = *fbp;
nmbp++;
}
fbp++;
break;
case mode_find_height:
while(*fbp == ' ') fbp++;
m = mode_get_height;
break;
case mode_get_height:
if(*fbp == ']'){
pd->height = atoi(numbuf);
mapbuf = calloc(sizeof(char), pd->width * pd->height);
mbp = mapbuf;
m = mode_get_map;
} else {
*nmbp = *fbp;
nmbp++;
}
fbp++;
break;
case mode_get_map:
if(mbp-mapbuf >= pd->width * pd->height){
pd->map = mapbuf;
m = mode_finish;
}
if(*fbp == '.'){
*mbp = 0;
mbp++;
}
if(*fbp == '*'){
*mbp = 1;
mbp++;
}
fbp++;
break;
}
}
}
I feel that while this is definitely more verbose, it is far easier to follow.
Example File:
$ cat puppy.pc "Puppy" [10, 10] * * . . . . . . * * * . * * . . * * . * . . * * . . * * . . . . . . . . . . . . . . . . * * . . . . * . . * * * * . . * * . . . * * . . . * . * . . * * . . * . . . * * . . * * . . . . . . * * . . . .
Result
$ ./a.out puppy.pc Puppy (10 x 10) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Another example:
$ cat test "Smiley"[5,5]......*.*......*...******
Result:
$ ./a.out test Smiley (5 x 5) * * * * * * * * *
-
1\$\begingroup\$ Thanks for the edit, @Jamal, looks much better without the erroneous syntax highlighting. \$\endgroup\$Braden Best– Braden Best2015年12月01日 19:35:18 +00:00Commented Dec 1, 2015 at 19:35
3 Answers 3
Strictly speaking,
sscanf
will do most of the job in one line of code. I assume you want to avoid it for learning purposes.As in the part 1, the malformed input may result in the infinite loop.
It doesn't look like the state machine is warranted here: the flow is strictly linear. A set of functions (
get_name(), get_int(), get_map()
) returning (along with the actual result) the breaking point is much more readable:fbp = get_name(fbp, nbuf); while (*fbp++ != '[') fbp++; fbp = get_int(fbp, &pd->width); while (*fbp++ != ',') fbp++;
etc.
-
\$\begingroup\$ Good idea on the functions! And thank you for addressing the
mode
system. In practice, though, I don't think a malformed file will result in an infinite loop, as eventually,fbp
will have to reach the end of the buffer, and the eventual attempt to access a char that's out-of-bounds will (should) result in aSegmentation fault
. Tested it with a file missing a]
, and confirmed it crashes. Though the error is*** stack smashing detected *** ... Aborted (core dumped)
. I suppose I could protect against that by adding a function to validate that the file is correctly formatted. \$\endgroup\$Braden Best– Braden Best2015年12月01日 20:09:20 +00:00Commented Dec 1, 2015 at 20:09
Abstain from implementing the Loop-Switch-Antipattern.
No, seriously.
If you want to add comments, you can do so.Test all functions which can fail for error, and handle that, for example by notifying the caller:
malloc
/calloc
,fopen
,fgets
.Avoid large stack-buffers. Also, use dynamic memory-allocation to avoid arbitrary limitations.
Actually check the above-mentioned limitations.
Don't zero memory just for the heck of it. There are more amusing ways to waste time.
Dito for reading a file: Use
fread
to read whole blocks instead offgetc
to read a single byte, thus incurring unavoidable overhead multiple times.As an aside, a
char
is the wrong data-type for saving the return-value offgetc
, use anint
soEOF
can be differentiated from(char)-1
.I would read the file in binary mode, meaning you can use
ftell
andfseek
to get the size of the file, and restart from the beginning.
-
\$\begingroup\$ See? I knew there was something wrong with that. Something about it felt dirty, but I couldn't quite place it. Thank you for the source. On
#5
, is there a specific problem withcalloc
? I've learned to use it overmalloc
because it ensures that the data will be zero'd out, avoiding unexpected garbage. Whilemalloc
does tend to zero out memory (rather, the OS kernel zeros out the memory for security reasons), the C standard doesn't explicitly state that malloc has to zero out memory. \$\endgroup\$Braden Best– Braden Best2015年12月01日 20:48:54 +00:00Commented Dec 1, 2015 at 20:48 -
\$\begingroup\$ As for the aside...oops! I could have sworn I declared that as an int. I know exactly what you mean, because I've tried to use
char
for reading a binary file (e.g. implementingxxd
for learning purposes), and for certain character values, the results are...rather unexpected. \$\endgroup\$Braden Best– Braden Best2015年12月01日 20:51:31 +00:00Commented Dec 1, 2015 at 20:51 -
\$\begingroup\$ Is there anything better about expected garbage? No (though debug-runtimes like to fill with a pattern likely making errors obvious, thus not zero). The rule is, don't initialize to some dummy value, that just defeats any chance for automatic tools to catch logic errors, assuming it can be optimized out. If it cannot, it also costs performance. \$\endgroup\$Deduplicator– Deduplicator2015年12月01日 20:55:24 +00:00Commented Dec 1, 2015 at 20:55
Naming
Please, use adequate names. (e.g: who reads your code doesn't know what fbuf
means).
Parser
- Any reason why you can't use
fgets()
? If you need every char in order to inspect it or modify or whatever else then usefgets()
. - You're risking buffer overflow if the user inputs
10000
characters or more. You need to dynamically allocate memory, usingmalloc()
andrealloc()
. - Use
sscanf
. Checking the return result ofsscanf
would be a good idea as well. The *scanf() functions return the number of items read. Don't compare the return value with EOF. - You can use
fseek
andftell
to calculate the file size for you without having to read every character.
Fair example:
long fsize(FILE *fp) {
fseek(fp, 0, SEEK_END);
long bytes = ftell(fp);
rewind(fp);
return bytes;
}
These functions are part of standard C (stdio.h), so this is portable.
- Take care of
pre
/post
fixes for variables. - In the last
case
, you don't need to usebreak;
.
-
1\$\begingroup\$ As for the
break;
, it's a habit of mine to be explicit and leave room for additions. The last time I aggressively byte-counted like that, I spent the next three hours debugging what ended up being a missing comma. So thanks, but I'm not too concerned about the size of my source code. \$\endgroup\$Braden Best– Braden Best2015年12月01日 19:47:14 +00:00Commented Dec 1, 2015 at 19:47 -
\$\begingroup\$ As for
fgets
, I considered it, but when I realized I'd have to calculate the size of the file tofgets
it into a buffer, it kind of seemed redundant, as fgetc will do the job without needing to read the file twice. \$\endgroup\$Braden Best– Braden Best2015年12月01日 19:50:55 +00:00Commented Dec 1, 2015 at 19:50 -
\$\begingroup\$ As for the limited file size, I totally forgot about that. I'll definitely
calloc/free
it with a larger size. Thanks for pointing that out! \$\endgroup\$Braden Best– Braden Best2015年12月01日 19:52:43 +00:00Commented Dec 1, 2015 at 19:52 -
\$\begingroup\$ @B1KMusic what I mean is that it's good to use
break
for thecase
s, but not in the last one: there won't be any necessity. Thefgets
function reads at most one less than the number of characters specified byn
from the stream pointed to by stream into the array pointed to bys
. No additional characters are read after a new-line character (which is retained) or after EOF. \$\endgroup\$user91060– user910602015年12月01日 20:02:51 +00:00Commented Dec 1, 2015 at 20:02 -
1\$\begingroup\$ Yes, I am aware that
break
isn't needed by the compiler, and, as you pointed out at the end, one can put thebreak
anyway (and the compiler will optimize it out) if they prefer. And, I prefer to remain as explicit as possible to avoid bug-prone code, which is why I leave thebreak
in there. That's just defensive coding practice that I've learned over the years. \$\endgroup\$Braden Best– Braden Best2015年12月01日 20:30:13 +00:00Commented Dec 1, 2015 at 20:30
Explore related questions
See similar questions with these tags.