I wrote this function for my next learning project (cfg parser). What do you think about it?
char readline(FILE *fp, char **line, size_t *size) {
*size = 0;
size_t cpos = ftell(fp);
int c;
while ((c = fgetc(fp)) != '\n' && c != EOF) (*size)++;
if (*size == 0 && c == EOF) {
*line = NULL;
return 1;
}
*line = calloc(*size + 1, sizeof(char));
if (*line == NULL)
return -1;
fseek(fp, cpos, SEEK_SET);
if (fread(*line, 1, *size, fp) != *size)
return -2;
fgetc(fp); // Skip that newline
return 0;
}
Can be used as follows:
FILE *fp = fopen("file.txt", "rb");
char *line = NULL;
size_t size = 0;
readline(fp, &line, &size);
4 Answers 4
sizeof(char)
is guaranteed to be 1.If you think that in future you might want to use another representation of characters (say,
wchar_t
), you'd have to modify code in more than one place. It is much safer to infer size from the variable, rather than from type:*line = calloc(..., sizeof(**line));
This way there is a single modification point.
Similarly, you may want to use
sizeof(**line)
instead of literal 1 infread
.fseek
(andftell
)may fail withEBADF
if the stream is not seekable. Since your function has no control over what kind of stream it is handled, it is advisable to test their return codes.One could argue that reading 0 characters is in fact a success (a client can tell the difference by examining
size
), so differentiating return codes (0 and 1) is redundant.It is highly recommended to use named macros (instead of magic numbers -1 and -2) for failure codes.
-
\$\begingroup\$ For what it's worth, I myself like using
sizeof(char)
despite it always being1
--- it clarifies intent to the reader and makes the requirement to change more obvious if it ever gets changed to a different type (say,wchar_t
). This is especially relevant inmalloc
, where you have a single argument:malloc(len)
becomesmalloc(len * sizeof(wchar_t))
. \$\endgroup\$Tim Čas– Tim Čas2015年02月10日 13:16:34 +00:00Commented Feb 10, 2015 at 13:16
Profiling will tell, yet I doubt reading a file twice is faster than reading a file sequentially once and performing
realloc()
as needed.ftell(fp)
returnslong
. Code should belong cpos
.long
is a signed integer whose positive value might not fit in asize_t
. But more importantly,long
may be insufficient for a file's size. Considerint fgetpos(FILE * restrict stream, fpos_t * restrict pos);
andfsetpos()
instead.It is important to check the returns of
ftell()
,fseek()
for errors.Rather than
calloc(*size + 1, sizeof(char));
, considercalloc(*size + 1, 1)
orcalloc(*size + 1, sizeof *(*line))
.As the intention is to return a pointer to a string,
*line
needs termination.calloc()
negates the need for(*line)[*size] = '0円'
, but that is subtle and likely deserves a comment that the string is properly terminated. I could see an update replacingcalloc()
withmalloc()
and not then explicitly terminating the string.Curious that code does not save the
'\n'
. With this code running in binary mode on various machines that use an alternate line ending like"\r"
, maybe the entire file would then be 1 line as the is no'\n'
. Or in Windows with"\r\n"
, each line ends with'\r'
. Recommend amending this code to work with more line ending than'\n'
or insure only text mode is used.
[edit]
Nice use of
fseek(fp, cpos, SEEK_SET)
to go back to a position derived fromftell()
. This is better thanfseek(fp, cpos - *size, SEEK_SET);
which is UB for text files.The return type
char
is weak.char
may be unsigned and returning-1
could become255
. Suggestint
.About the sample usage: This
readline()
allocates memory, and should demo proper usage includingfree()
:FILE *fp = fopen("file.txt", "rb"); char *line = NULL; size_t size = 0; while (0 == readline(fp, &line, &size)) { puts(line); free(line); }
You appear to be unknowingly reinventing-the-wheel. This function is similar to the getline()
function specified in POSIX 1003.1-2008 — yet incompatible. The order of the parameters is different, and your return value is less informative. If you are implementing this function for your own amusement, or because your system lacks POSIX 2008 support, then you should mimic the getline()
interface.
It is misleading to use char
as the return type for a status code that is conceptually a number. (getline()
returns a ssize_t
to indicate the length of the resulting string, or -1 for failure, with additional information in errno
.)
Reading one character at a time with fgetc()
is bad for performance. I recommend using fgets()
instead. Then, your function would just act as a convenient memory reallocation wrapper for fgets()
. You would also avoid the need to seek, which lets your function work with unseekable streams. It may take several tries to arrive at an overestimate of the required buffer size, but I think that it would still be better than reading the input twice.
-
\$\begingroup\$ I think the OP may be reinventing the whole car :-) When doing a scanner/parser, it's generally simpler and much more efficient to use flex/bison to do the work. \$\endgroup\$jamesqf– jamesqf2015年01月30日 04:56:41 +00:00Commented Jan 30, 2015 at 4:56
-
\$\begingroup\$ Yet the implementations of
getline()
I find available on github et cetera all read the file one character at a time. :) \$\endgroup\$Madagascar– Madagascar2024年02月24日 12:49:44 +00:00Commented Feb 24, 2024 at 12:49
Few things:
Return
int
instead ofchar
, it is easily extensible for future and you may be able to match with standard POSIXerrno
. Else have explicit enum values to indicate return status.Seems you are reading contents twice, once to determine size and then to actually read the entire line. Even though these are buffered fetches, measure and consider optimizing (also see 3.)
Mimic
scanf
. Read strings assuming caller has allocated enough memory for the line. That way you only need to do "append" as you read characters. In general, a bad idea to have memory allocation and de-allocation in separate functions. It is easy to miss afree
and cause memory leaks. API should be designed so as to minimize user errors.fscanf(fp,"%[^\n]",buffer);
is usually a good substitute to read a single line
-
\$\begingroup\$
Read strings assuming caller has allocated enough memory for the line.
scanf
's%s
option is widely seen as broken unless it's being used for input that's known to be trusted and well formed (or a length limit is specified in the format). A general function (and really a non-broken) cannot make assumptions about buffer sizes. You're right that it's uncomfortable to callfree()
on something you didn't get back directly frommalloc()
, but this is essentially a function made for handling the reallocation for you (in the same way some people wrapfgets
to truly read a line). \$\endgroup\$Corbin– Corbin2015年01月30日 04:49:43 +00:00Commented Jan 30, 2015 at 4:49 -
\$\begingroup\$ (Continued) If it were a more complex thing being allocated than a c-string, I'd be all for creating a higher level data type, but at the end of the day, the usage of this function seems completely reasonable (I would have it return a pointer instead of an error code, and I would have the error code be an optional out param, but different strokes), and doing anything else either cripples the functionality or introduces unnecessary complexity. This is one of the rare exceptions where
free()
ing some other function's doing seems sane. \$\endgroup\$Corbin– Corbin2015年01月30日 04:52:01 +00:00Commented Jan 30, 2015 at 4:52