I've written a function to parse configuration files. It doesn't handle file opening, but instead takes an array with the file contents and tries to find fields based on an array containing the fields' names. If found, it puts the content into its respective place.
The only rules it has so far are:
- each field is delimited by a newline
'\n'
- case doesn't matter for the field's name
- after a field's name has been found, leading spaces and tabs are irrelevant and discarded
- lines containing unrecognized fields are ignored
- the character separating the field's name and its respective content must be specified in the field's name
Example file:
FIELD 1: CONTENT 1 FIELD 2: CONTENT 2 FIELD 3: CONTENT 3 FIELD 4: CONTENT 4 FIELD 5: CONTENT 5
Here's an example call:
char *configuration = load_file(file_name);
char *field1;
char *field2;
char *field3;
char *field4;
char *field5;
static const char *fields[] = {
"FIELD 1:",
"FIELD 2:",
"FIELD 3:",
"FIELD 4:",
"FIELD 5:",
0
};
char **destinations[] = {
&field1,
&field2,
&field3,
&field4,
&field5,
0
};
//Parse and load fields
parse(destinations, fields, configuration);
And the function:
//strings allocated must be freed by the caller
static void parse(char ***destinations, const char **fields, const char *source)
{
char buffer[BUFFER_SIZE];
int i, j, buffer_position;
while(*source != '0円')
{
for(i = 0; fields[i] != 0; ++i)
{
//check if first character match, case insensitive
if(*fields[i] == *source || inverse_case(*fields[i]) == *source)
{
//check if subsequent characters match
for(j = 0; fields[i][j] != '0円'; ++j)
{
if(fields[i][j] != source[j] && inverse_case(fields[i][j]) != source[j]) break;
}
//it's not the right field, look for another
if(fields[i][j] != '0円') continue;
//found, update pointer position
source += j;
//skip leading spaces and tabs
while(*source == ' ' || *source == '\t') ++source;
//collect content
for(buffer_position = 0; *source != '\n' && buffer_position < BUFFER_SIZE - 1; ++buffer_position, ++source)
{
buffer[buffer_position] = *source;
}
buffer[buffer_position] = '0円';
//if it's an empty line, nothing to copy
if(buffer_position > 0)
{
//allocate memory for destination and copy
if((*destinations[i] = malloc(buffer_position + 1)) != 0)
{
COPY(*destinations[i], buffer);
}
//if there's an error on malloc(), we just skip the field and notify the user
else
{
printf("Error: malloc(%d)\n", buffer_position + 1);
}
}
//since we found a field that matches, no need to keep looking
//break the for loop, skip the while and process next line
goto next_iteration;
}
}
//no matches for the current line just skip it entirely
while(*++source && *source != '\n');
next_iteration:
++source;
}
}
My questions:
- Would it be considered better if the line delimiter was defined by the caller?
- By adding the options to not ignore spaces and tabs, and set case sensitivity on/off, would the parameter list be considered too big?
-
\$\begingroup\$ IT would be a better idea to use an existing file format (such as json) and use that. It would be better to read from the stream and match dynamically rather than have to read the whole file into memory. SAX verses DOM model. \$\endgroup\$Loki Astari– Loki Astari2013年10月21日 10:16:34 +00:00Commented Oct 21, 2013 at 10:16
1 Answer 1
I'll only address C programming issues in this answer, not the algorithm itself.
Program design and programming practice
As a rule of thumb, there is never a reason to use more than two levels of indirection in C programming. More levels than that always originates from flawed program design. In your case, you are allocating an array of pointers to char. So it should be
char* destinations[]
.Because of the above reason, the function should be changed to
static void parse (char **destinations, const char **fields, const char *source)
or if you would prefer:
static void parse (char** destinations, const char** fields, const char* source)
It is bad practice to allocate memory in a different module than the one doing the allocation. If
parse()
does memory allocation, then a function located in the same module should handle the cleanup, not the caller. I would advise to create a function calledparse_cleanup()
or similar, which does this.The function
parse
is performing a parsing algorithm. The function shouldn't concern itself with anything else but that. Dynamic memory allocation is completely unrelated to the task "parsing". Consider splitting the code in two sub-functions: one which handles the algorithm and one for the memory allocation.Your loop control is made needlessly complex. You control the iteration from no less than 5 places: outer while loop, first for loop, continue statement, goto statement. This is spaghetti code and needs to be rewritten from scratch.
You don't need nested loops at all. Throughout the iteration, you will have to constantly check for
0円
, as any string can end anywhere. You can merge the while and for into a single while. That will also mean that you can remove the goto.Instead of having one big super loop that does a lot of things, consider writing a simple loop that calls upon various functions. Try to avoid
break
if possible, and always avoidcontinue
andgoto
, as the latter two are completely superfluous language features that many programmers consider hard to read.
Consider to add some error handling to the function. Parsers in particular usually have many things that can go wrong. Then it is nice for the caller to have detailed error information.
Investigate if you can't use the standard functions
isblank()
,isspace()
andtoupper()
from ctype.h, rather than your own custom ones.
Coding style (these remarks are more subjective and not everyone will agree)
Always write statements on a line of their own and always use braces. This makes the code easier to read and prevent bugs during code maintenance. For example:
while(*source == ' ' || *source == '\t') { ++source; }
The above is especially important in the case of
while(*++source && *source != '\n');
, because there you can't tell whether the programmer intentionally wrote an empty loop, or if they accidentally wrote a;
at the end of the line, which is a very common, classic bug. It is far better to write:while(x) {} // this means: "yes, I did intend this loop to be empty"
Never use the ++ operator in the same expression as other operators, it can easily lead to undefined behavior bugs such as the classic
i = i + i++;
bug. But it is also harder to read. Mixing ++ with other operators is pointless anyhow, since it doesn't affect performance or the final machine code in any way. One example of how to rewrite it:do { source++; } while (*source != '0円' && *source != '\n');
(And of course the check for
0円
should be explicit like in this example - that is good practice and you do it everywhere else in your code.)Avoid assignment inside conditions. Just as with the ++ operator, it can lead to bugs, and there is never a reason to have assignment inside conditions, it is another 100% superfluous feature of the language.
Most compilers will warn against assignment inside conditions, so it is annoying to have it there. You should not disable such warnings, because they will find the bug
x=y
where you intended to havex==y
.Furthermore, in the case of
if((*destinations[i] = malloc(buffer_position + 1)) != 0)
, did you actually intend to set the contents to NULL if allocation fails? It may or may not be better to leave the pointer alone in that case. Your function call documentation must mention which applies.It is good practice to compare against NULL rather than 0 when dealing with pointers, to show the reader that you did intend to compare a pointer and not an integer value.
If you heed my advice to never use ++ together with other operators, then the need for either prefix ++ or postfix ++ is gone: they turn completely equivalent. Ask yourself if you are using prefix ++ for a particular reason. Because people who do think they have a reason, tend to either name something based on superstition or something about how compilers were designed in the 80s.
-
\$\begingroup\$ Thanks for the comprehensive answer. The pointer is set to
NULL
on purpose whenmalloc()
fails. It's not meant to be a general parser, I guess the function name should be changed. I have a functionload_settings()
that uses this one to fill a structure and another functionclose_settings()
that cleans up. I'm checking if the structure is correctly filled onload_settings()
and notifying the caller from there. Do you mind posting what you consider to be the correct solution for this task so I can understand better your points? \$\endgroup\$2013Asker– 2013Asker2013年10月22日 05:08:57 +00:00Commented Oct 22, 2013 at 5:08