I was looking for a C function that reads lines of arbitrary length from a file. I didn't find anything absolutely portable and safe from buffer overflows, so I tried writing my own.
- Does it look like I made any memory management mistakes?
- Does it make sense to have this function return
NULL
everywhere that it does? - Have I made any other mistakes?
char *getaline (FILE *file)
{
size_t buffsize = 120; /* The minimum size for a line buffer */
size_t this_char = 0;
if (file == NULL) {
return NULL;
}
char *line = (char *) malloc(buffsize);
char **linebuff = &line;
if (*linebuff == NULL) {
return NULL;
}
char c = getc (file);
if (c == EOF) {
return NULL;
}
while (1) {
if (this_char + 1 >= buffsize) {
buffsize = 2 * buffsize;
char *next_linebuff;
next_linebuff = (char *) realloc (*linebuff, buffsize);
if (next_linebuff == NULL) {
return NULL;
}
*linebuff = next_linebuff;
}
(*linebuff)[this_char] = c;
this_char++;
if (c == '\n' || c == EOF) {
break;
}
c = getc (file);
}
(*linebuff)[this_char] = '0円';
line = *linebuff;
return line;
}
int main (int argc, const char *argv[]) {
FILE *file = fopen(argv[1], "r");
char *line = "";
while ((line = getaline(file)) != NULL) {
printf("%s", line);
}
return 0;
}
-
1\$\begingroup\$ Might write a full response later if someone else hasn't already, but the first thing that jumps at me is that there are two possibilities for memory leaks in the function. \$\endgroup\$Corbin– Corbin2012年10月31日 02:58:01 +00:00Commented Oct 31, 2012 at 2:58
-
\$\begingroup\$ Also, have you considered just using fgets? Even if you don't use it directly, your function could be built on it. \$\endgroup\$Corbin– Corbin2012年10月31日 04:50:59 +00:00Commented Oct 31, 2012 at 4:50
4 Answers 4
As for existing solutions, check out readline
. It's a pretty common library on *nix and even has a Windows port.
As for a critique of what you posted:
char *line = (char *) malloc(buffsize);
Don't bother casting a void*
to another pointer type. This is a C++ thing. In C you don't have to do it, the conversion can be done implicitly.
char c = getc (file);
This is a common portability pitfall. A char
cannot represent the full range of characters and also EOF
. Depending on the environment you may not be able to distinguish between EOF
and byte 255, or you may not be able to detect EOF
at all. Store the result of getc
in an int
, because that's what it returns.
char *line = (char *) malloc(buffsize);
// (snip)
char c = getc (file);
if (c == EOF) {
return NULL;
}
If you hit that return
you will leak the allocation of line
.
Now, you could introduce a free
inside the if
block... But many C programmers (myself included) prefer to avoid early return
statements, because what happens is you end up with repetitive "free all the buffers" cleanup blocks. Instead of doing an early return
, I like to let the scope own the allocation, meaning I insert the free
whenever something like line
goes out of scope, having the same cleanup block run in a success or failure case. (This sort of imitates C++'s RAII pattern, but with C and more manual.)
if (this_char + 1 >= buffsize) {
buffsize = 2 * buffsize;
These integer operations can overflow. Maybe that's not a huge deal (I'll admit, I myself write a lot of code that doesn't handle overflow), but something to be aware of, should you be allocating sizes around the boundaries of the type of the size variables.
next_linebuff = (char *) realloc (*linebuff, buffsize);
if (next_linebuff == NULL) {
return NULL;
}
*linebuff = next_linebuff;
(削除) Looks like you've successfully avoided the "leak on (Update: Sorry, must have suffered temporary blindness, you do leak here.) But I think this is too verbose. Why do you need to maintain realloc
failure" issue that traps many newbies. (削除ここまで)linebuff
as a double pointer? You already have line
.
(*linebuff)[this_char] = '0円';
line = *linebuff;
Seems like you missed a potential realloc
here. If the buffer is exactly full when you hit that line, you'll write past the allocation.
while ((line = getaline(file)) != NULL) {
printf("%s", line);
}
You never call free
on line
here.
My improved version follows, taking most of these suggestions. I've tried to keep your code mostly the same and follow the same style.
char *getaline (FILE *file)
{
size_t buffsize = 120; /* The minimum size for a line buffer */
size_t this_char = 0;
if (file == NULL) {
return NULL;
}
char *line = malloc(buffsize);
if (line == NULL) {
return NULL;
}
int c;
do {
c = fgetc(file);
if (this_char + 1 >= buffsize) {
buffsize = 2 * buffsize;
char *next_linebuff;
next_linebuff = realloc (line, buffsize);
if (next_linebuff == NULL) {
free(line);
line = NULL;
break;
}
line = next_linebuff;
}
if (c == EOF || c == '\n') {
c = 0;
}
line[this_char++] = c;
} while (c);
return line;
}
int main()
{
char *p = NULL;
while ((p = getaline(stdin)) && *p)
{
puts(p);
free(p);
}
free(p);
return 0;
}
-
\$\begingroup\$ Why set
line = NULL
if it's always set tonext_linebuf
before the next iteration? Is it just-in-case programming practice? \$\endgroup\$Reinstate Monica– Reinstate Monica2018年11月17日 16:48:30 +00:00Commented Nov 17, 2018 at 16:48 -
\$\begingroup\$ @afuna There is no next iteration in the case you're describing.
line = NULL
appears afterfree
and beforebreak
, then gets returned to the caller. We can't return a stale pointer. In general though, I do like to null out pointers after a call to deallocate. \$\endgroup\$asveikau– asveikau2018年11月29日 22:37:26 +00:00Commented Nov 29, 2018 at 22:37 -
\$\begingroup\$ oops... I wasn't paying attention to the scope re. the
break
. my bad. \$\endgroup\$Reinstate Monica– Reinstate Monica2018年11月29日 22:44:56 +00:00Commented Nov 29, 2018 at 22:44
I see no need for the variable linebuff
. In every location it is used you just de-reference it. Which makes it the same as line
anyway. So wherever you use *linebuff
just use line
instead.
char **linebuff = &line;
Here you are potentially leaking line
.
At every return point you need to make sure you clean up any allocated memoty.
if (c == EOF) {
return NULL;
}
I think your loop is the wrong way around.
Normally when reading a line I would not add the new line character to the buffer (but that is a debatable point). Which you seem to be trying to do with the condition at the end. But if the first character is a new line then you add it to the buffer (because you don't test c
that was read outside the loop.
while (1) {
You could simplify this by doing the read (and all the tests at the top).
int c; // note `c` should be an int.
while((c = getc(file)) != EOF)
{
if (c == '\n')
{ break;
}
// Now do all the work to add the character to the buffer.
Another potential leak.
If realloc() fails then the original buffer is not affected. So you will need to release it if you return NULL.
char *next_linebuff;
next_linebuff = (char *) realloc (*linebuff, buffsize);
if (next_linebuff == NULL) {
return NULL;
}
Also I don't like the two line call to realloc(). I would declare and do the assignment in a single line:
char *next_linebuff = realloc (line, buffsize);
In main() you get a dynamically allocated string from a function. Documentation is scetchy but it looks like you should be taking ownership. Which means you are also responsable for clean up (ie freeing the string).
while ((line = getaline(file)) != NULL) {
Where each call to getaline() means the result of the last call is leaked.
Your line doe not contain a new line.
So when you print it, it may be a good idea to add the new line back
printf("%s", line);
It should look like this:
char* line;
while ((line = getaline(file)) != NULL)
{
printf("%s\n", line);
free(line);
}
Others have dissected your function thoroughly. In case you are interested, an alternative method of accessing a normal file (but not a socket, pipe, etc) is to use mmap
. This maps the whole file into memory without the need to malloc
anything. The allocation occurs in the background as you access the mapped file - which for a text file now looks just like a string.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
static void fail(const char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
}
int main (int argc, const char *argv[])
{
(void) argc;
int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fail(argv[1]);
}
struct stat st;
if (fstat(fd, &st) < 0) { /* to get the file size */
fail(argv[1]);
}
void *const map = mmap(0, st.st_size,
PROT_READ|PROT_WRITE,
MAP_FILE|MAP_PRIVATE, fd, 0);
if (map == NULL) {
fail(argv[1]);
}
char *term;
char *s = map;
while ((term = strchr(s, '\n')) != NULL) {
char ch = *term;
*term ='0円'; /* replace the '\n' so we can print the line */
printf("%s\n", s);
*term = ch;
s = term + 1;
}
if (*s) { /* if last line has no '\n' */
printf("%s\n", s);
}
munmap(map, st.st_size);
return EXIT_SUCCESS;
}
You definately have memory leaks.
==92867== Memcheck, a memory error detector
==92867== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==92867== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==92867== Command: ./a.out test1
=わ=わ92867=わ=わ
-ひく-ひく92867-- ./a.out:
--92867-- dSYM directory is missing; consider using --dsymutil=yes
abc
def
ghi
==92867==
==92867== HEAP SUMMARY:
==92867== in use at exit: 10,815 bytes in 38 blocks
==92867== total heap usage: 38 allocs, 0 frees, 10,815 bytes allocated
==92867==
==92867== LEAK SUMMARY:
==92867== definitely lost: 480 bytes in 4 blocks
==92867== indirectly lost: 0 bytes in 0 blocks
==92867== possibly lost: 0 bytes in 0 blocks
==92867== still reachable: 10,335 bytes in 34 blocks
==92867== suppressed: 0 bytes in 0 blocks
==92867== Rerun with --leak-check=full to see details of leaked memory
==92867==
==92867== For counts of detected and suppressed errors, rerun with: -v
==92867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Furthermore, It segfaults on me when I give it lines longer than 119 characters. (But this is probably system dependent).
$ ./a.out test3 ; cat test3
abc
this is a test
this is a line with 40 characters in it
this is a line with 119 characters in it..............................................................................
Segmentation fault: 11
abc
this is a test
this is a line with 40 characters in it
this is a line with 119 characters in it..............................................................................
this is a line with 120 characters in it...............................................................................