This is part of a two-part post (Part 2).
Here, I have two recent projects that parse a file. The first uses a loop that's kind of hard to follow, and the second uses "modes" to decide what to do.
Project 1: A "bookmark manager" for managing URLs easier than a text editor.
The data structures:
typedef struct {
char *text;
int length;
} string;
typedef struct {
string *path;
string *title;
} url;
typedef struct {
url **urls; // An array of pointers to urls
int length;
int sel_url;
int modified;
char *filename;
} urllist;
For the data, I am considering using a linked list instead of an array.
The parser:
void parse_file(char *fname, urllist *ul){
/* Dumb char-by-char parser that follows the syntax "URL[Title]"
* [Title] is optional. If left unspecified, then the url is used as
* a title instead.
*
* There's some basic idiot-proofing:
*
* 1. Empty lines are ignored (lazy-ignore)
* Rather, '\n' and ' ' are skipped if they begin a line.
* " http://google.com" is still valid; it'll just take two
* iterations to extract anything from it
*
* 2. lines beginning in '[' or '#' are fully ignored (full-ignore).
* So you can write comments like "#this" or "[this]". The
* 'comment char' has to be the first character on the line. If
* there is a space before it, for example, the parser will think
* it's a url.
*/
int c;
url *u;
FILE *f = fopen(fname, "r");
while(1){
c = fgetc(f);
if(c == -1){ /* Check EOF */
break;
}
if(c != '\n' && c != ' ' && c != '[' && c != '#'){
/* Thanks to the EOF check, I have to add the first char
* NOW, while I still have it. This is also an opportunity
* for a sanity check */
u = new_url();
string_push_char(u->path, c);
} else if(c == '#' || c == '['){ /* full-ignore */
while((c = fgetc(f)) != '\n');
continue;
} else { /* lazy-ignore */
continue;
}
while(1){ /* Step through individual line */
c = fgetc(f);
if(c == '\n'){
/* Line ends without title, copy path to title */
string_copy(u->title, u->path);
break;
}
if(c == '['){
/* Title specified. */
while(1){ /* Get it. */
c = fgetc(f);
if(c == ']'){
while(fgetc(f) != '\n');
break;
}
string_push_char(u->title, c);
}
break;
}
string_push_char(u->path, c);
}
url_push(ul, u);
}
fclose(f);
}
It works, but I feel like it could be done better. As I said earlier, it's hard to follow. You can tell because it's littered with comments, which I personally try to avoid unless describing a block of code as a whole.
An example file
# Cool sites http://google.com/search?q=help+with+bananas[Google: banana help] http://stackoverflow.com[Stack Overflow] # YouTube stuff http://youtube.com
Result
[Google: banana help ] Stack Overflow http://youtube.com http://google.com/search?q=help with bananas
1 Answer 1
The code sure must be streamlined. I didn't find any better way to review but to rearrange the code and put my notes as comments.
// First, you test the character against the same condition twice.
// A more direct way would be to do it once
if (c == '#' || c == '[') {
while (fgetc(f) != '\n')
;
continue;
}
if (c == ' ' || c == '\n')
continue;
// Notice that here you still have `c` intact (so the comment about
// NOW is misleading). Time to initialize the `url`:
u = new_url();
// Since you have the first character of the `u->path`, a correct
// idiom is a `do {} while ();` loop:
do {
string_push_char(u->path, c);
c = fgetc(f);
} while(c != '\n' && c != '[');
// Here `c` is the reason of loop termination. Test it and proceed accordingly:
if (c == '[') {
while ((c = fgetc(f)) != ']') {
string_push_char(u->title, c);
}
while(fgetc(f) != '\n')
;
} else {
string_copy(u->title, u->path);
}
This almost it. The code parses the well-formed file, but will fall into an infinite loop if, for example, the title-closing bracket is missing.
As a side note, isspace(c)
is strongly preferred to a manual testing.
Explore related questions
See similar questions with these tags.