File parsing and data management

Question 1

This is part of a two-part post (Part 2).

Here, I have two recent projects that parse a file. The first uses a loop that's kind of hard to follow, and the second uses "modes" to decide what to do.

Project 1: A "bookmark manager" for managing URLs easier than a text editor.

The data structures:

typedef struct {
 char *text;
 int length;
} string;
typedef struct {
 string *path;
 string *title;
} url;
typedef struct {
 url **urls; // An array of pointers to urls
 int length;
 int sel_url;
 int modified;
 char *filename;
} urllist;

For the data, I am considering using a linked list instead of an array.

The parser:

void parse_file(char *fname, urllist *ul){
 /* Dumb char-by-char parser that follows the syntax "URL[Title]"
 * [Title] is optional. If left unspecified, then the url is used as
 * a title instead.
 *
 * There's some basic idiot-proofing:
 * 
 * 1. Empty lines are ignored (lazy-ignore)
 * Rather, '\n' and ' ' are skipped if they begin a line.
 * " http://google.com" is still valid; it'll just take two
 * iterations to extract anything from it
 *
 * 2. lines beginning in '[' or '#' are fully ignored (full-ignore).
 * So you can write comments like "#this" or "[this]". The
 * 'comment char' has to be the first character on the line. If
 * there is a space before it, for example, the parser will think
 * it's a url.
 */
 int c;
 url *u;
 FILE *f = fopen(fname, "r");
 while(1){
 c = fgetc(f);
 if(c == -1){ /* Check EOF */
 break;
 }
 if(c != '\n' && c != ' ' && c != '[' && c != '#'){
 /* Thanks to the EOF check, I have to add the first char
 * NOW, while I still have it. This is also an opportunity
 * for a sanity check */
 u = new_url();
 string_push_char(u->path, c);
 } else if(c == '#' || c == '['){ /* full-ignore */
 while((c = fgetc(f)) != '\n');
 continue;
 } else { /* lazy-ignore */
 continue;
 }
 while(1){ /* Step through individual line */
 c = fgetc(f);
 if(c == '\n'){
 /* Line ends without title, copy path to title */
 string_copy(u->title, u->path);
 break;
 }
 if(c == '['){
 /* Title specified. */
 while(1){ /* Get it. */
 c = fgetc(f);
 if(c == ']'){
 while(fgetc(f) != '\n');
 break;
 }
 string_push_char(u->title, c);
 }
 break;
 }
 string_push_char(u->path, c);
 }
 url_push(ul, u);
 }
 fclose(f);
}

It works, but I feel like it could be done better. As I said earlier, it's hard to follow. You can tell because it's littered with comments, which I personally try to avoid unless describing a block of code as a whole.

An example file

# Cool sites
http://google.com/search?q=help+with+bananas[Google: banana help]
http://stackoverflow.com[Stack Overflow]
# YouTube stuff
http://youtube.com

Result

[Google: banana help ]
 Stack Overflow
 http://youtube.com
http://google.com/search?q=help with bananas

Question 2

The code sure must be streamlined. I didn't find any better way to review but to rearrange the code and put my notes as comments.

 // First, you test the character against the same condition twice.
 // A more direct way would be to do it once
 if (c == '#' || c == '[') {
 while (fgetc(f) != '\n')
 ;
 continue;
 }
 if (c == ' ' || c == '\n')
 continue;
 // Notice that here you still have `c` intact (so the comment about
 // NOW is misleading). Time to initialize the `url`:
 u = new_url();
 // Since you have the first character of the `u->path`, a correct
 // idiom is a `do {} while ();` loop:
 do {
 string_push_char(u->path, c);
 c = fgetc(f);
 } while(c != '\n' && c != '[');
 // Here `c` is the reason of loop termination. Test it and proceed accordingly:
 if (c == '[') {
 while ((c = fgetc(f)) != ']') {
 string_push_char(u->title, c);
 }
 while(fgetc(f) != '\n')
 ;
 } else {
 string_copy(u->title, u->path);
 }

This almost it. The code parses the well-formed file, but will fall into an infinite loop if, for example, the title-closing bracket is missing.

As a side note, isspace(c) is strongly preferred to a manual testing.

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2015-12-01 19:35:55Z

The code sure must be streamlined. I didn't find any better way to review but to rearrange the code and put my notes as comments.

 // First, you test the character against the same condition twice.
 // A more direct way would be to do it once
 if (c == '#' || c == '[') {
 while (fgetc(f) != '\n')
 ;
 continue;
 }
 if (c == ' ' || c == '\n')
 continue;
 // Notice that here you still have `c` intact (so the comment about
 // NOW is misleading). Time to initialize the `url`:
 u = new_url();
 // Since you have the first character of the `u->path`, a correct
 // idiom is a `do {} while ();` loop:
 do {
 string_push_char(u->path, c);
 c = fgetc(f);
 } while(c != '\n' && c != '[');
 // Here `c` is the reason of loop termination. Test it and proceed accordingly:
 if (c == '[') {
 while ((c = fgetc(f)) != ']') {
 string_push_char(u->title, c);
 }
 while(fgetc(f) != '\n')
 ;
 } else {
 string_copy(u->title, u->path);
 }

This almost it. The code parses the well-formed file, but will fall into an infinite loop if, for example, the title-closing bracket is missing.

As a side note, isspace(c) is strongly preferred to a manual testing.

Stack Exchange Network

File parsing and data management

Project 1: A "bookmark manager" for managing URLs easier than a text editor.

The data structures:

The parser:

An example file

Result

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

File parsing and data management

Project 1: A "bookmark manager" for managing URLs easier than a text editor.

The data structures:

The parser:

An example file

Result

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions