I have developed a readline
function in C and I want to know your opinion on it:
// read a the whole line, stop if encouter eof
static char *readline(FILE *in, sid_error *error)
{
#define MAX_SIZE 64
size_t size = MAX_SIZE;
char *str;
if((str = malloc(sizeof *str * size)) == NULL) {
*error = ERROR_MALLOC;
return NULL;
}
char *pos;
char read = 0;
while(fgets(str + size - MAX_SIZE, MAX_SIZE, in)) {
read = 1;
if(pos = strchr(str, '\n')) {
*pos = '0円';
break;
}
size = size + MAX_SIZE - 1;
str = realloc(str, sizeof *str * size);
}
if(!read) {
free(str);
return NULL;
}
return str;
}
-
\$\begingroup\$ what happens to the part of the next line that gets read? just dropped? \$\endgroup\$ratchet freak– ratchet freak2015年01月25日 21:06:57 +00:00Commented Jan 25, 2015 at 21:06
-
\$\begingroup\$ only the current line is read, no part of the next line is been read. \$\endgroup\$KarimS– KarimS2015年01月25日 21:44:30 +00:00Commented Jan 25, 2015 at 21:44
-
1\$\begingroup\$ usage of a #define is dangerous! the names should be explicit (ie, can't have another part of the code with same variable name for something different), and beware that the C precompiler is NOT bounded by the { ... }, see stackoverflow.com/questions/6379489/… ) \$\endgroup\$Olivier Dulac– Olivier Dulac2015年01月26日 15:39:34 +00:00Commented Jan 26, 2015 at 15:39
4 Answers 4
Inconsistent memory allocation. Code checks for a
NULL
return onmalloc()
, yet no checks onrealloc()
. Recommend adding aNULL
check onrealloc()
.Reliance on no embedded
'0円'
. The following code searches up to'0円'
for a'\n'
to determine if reading the line is done. In unusual situations, a'0円'
will occur before the final'\n'
+'0円'
pair. So ifMAX_SIZE
was a larger value, the same file input (with an embedded'0円'
) could read more of the line than this code. This is a weakness offgets()
and thisreadline()
inherits it. To get around this, some function other thatfgets()
needs to be used for primary input.if(pos = strchr(str, '\n'))
Minor performance issue. In searching for the final
'\n'
, trimming fgets() is usually faster withstrlen()
thanstrchr()
. But more importantly, this code re-searches the same buffer unnecessarily. Consider instead the following.// if(pos = strchr(str, '\n')) { if(pos = strchr(str + size - MAX_SIZE, '\n')) {
Reallocation increments the buffer size in a linear fashion. Increasing exponentially (example: 2x) has advantages with unusually long lines. Yet for typically applications without insanely long lines, either approach is OK.
Nice correct use of
size_t
for buffer size.
[Edit]
- If there is the unusual embedded
'0円'
in input such as"abc0円xyx\n0円"
, then the while loop will unfortunately read the rest of the file. The while loop will not exit withif(pos = strchr(str, '\n'))
and will continue reading untilfgets()
returnsNULL
. As in point #3, subsequent searches for\n
should begin with the new data read and not at&str[0]
. Further, looping should only occur if'\n'
was not found and buffer is full.
Overall looks good, except that
MAX_SIZE
is misleading. It is reallyMIN_SIZE
(orINITIAL_SIZE
)sizeof *str
is guaranteed to be 1.read
is not a good choice for a name. There is already aread
system call, and the name clash makes a reviewer uncomfortable.Failing
realloc
results in memory leak and data loss. Typically you'd adhere to the patternfoo = malloc(size); bar = realloc(foo, newsize); if (bar == 0) { deal with it, you still have a good foo } else { foo = bar; }
strchr(size, '\n')
may lead to quadratic complexity.strchr(str + size, '\n')
does the same linearly.
-
\$\begingroup\$ realloc fixed before seen the response :), thanks for other points to fix \$\endgroup\$KarimS– KarimS2015年01月25日 23:01:55 +00:00Commented Jan 25, 2015 at 23:01
In no particular order:
#define
should not be placed in a function like that, define at file scope. In this case it seems unnecessary since you could declare it as a constant instead since it is used for dynamic allocation.
const size_t maxSize = 64;
always initialize all variables, just in case.
char* str = NULL;
...
char* pos = NULL;
you malloc should just take size
as argument i.e. malloc(size)
the rest makes it just look difficult to read and in this case is wrong, same applies to realloc
.
in general it is better to write functions that do not force the user of them to look at the implementation in order to figure out how to use them.
your function allocates memory and returns a buffer that the user needs to free. this fact is not obvious from the function name (it could have returned a ptr to a static buffer for instance, or a pointer to a global array whatever).
you should check the return value of realloc
, it is not always successful. you need to deal with that event as well. e.g. reading many lines and with a memory that is fragmented may cause it to fail to find a large enough continuous memory block.
realloc
also is an expensive function to call, it is better to allocate larger blocks than try to allocate several small blocks.
your function can fail in a few ways but you only set the *error
when malloc fails, you may want to add some more error codes like when nothing is read.
EDIT: removed fgets comment.
-
\$\begingroup\$ what i meant wass that if the file contains several lines ending with
\n
then fgets will read several lines or rather the while loop will. \$\endgroup\$AndersK– AndersK2015年01月26日 20:54:37 +00:00Commented Jan 26, 2015 at 20:54 -
\$\begingroup\$ @chux it is unnecessary to do a break since fgets will stop after reading up to '\n' but ok. gotcha \$\endgroup\$AndersK– AndersK2015年01月27日 09:01:42 +00:00Commented Jan 27, 2015 at 9:01
Well, basically, everything looks good. But, this function could treat a line abstractly of the line length. You may have two approaches:
- Counting symbols in the line in order to allocate enough memory to store that line in an array;
- Passing MAX_SIZE as a function parameter.