As an exercise to improve my coding ability in C, I have attempted to write a simple and safe file reader. I believe this should be portable as well to major platforms (correct me if I'm wrong). For simplicity's sake I assume the file will fit in memory which is mostly true for my use case.
Please let me know where I can improve.
(Note: I've used goto
because I think it makes the code flow better)
int read_file(const char* filename, char** buffer) {
FILE* fp = fopen(filename, "r");
if (!fp) {
goto f_read_error;
}
ssize_t file_len;
if (fseek(fp, 0L, SEEK_END) == 0) {
file_len = ftell(fp);
if (file_len == -1) {
goto f_read_error;
}
if (fseek(fp, 0L, SEEK_SET) != 0) {
goto f_read_error;
}
*buffer = (char*) malloc(sizeof(char) * (file_len + 1));
int output = fread(*buffer, sizeof(char), file_len, fp);
if (output != file_len) {
goto f_read_error;
};
(*buffer)[file_len] = '0円';
} else {
f_read_error: ;
char error_msg[1024];
// let's trunc filenames larger than 256, to not overflow error_msg buffer
char safe_filename[256];
if (strlen(filename) > 256) {
memcpy(safe_filename, filename, 256);
}
sprintf(error_msg, "Unable to read file : %s", safe_filename);
perror(error_msg);
if (buffer) {
free(buffer);
}
if (fp) {
fclose(fp);
}
return -1;
}
if (fp) {
fclose(fp);
}
return 1;
}
int main(int argc, char** argv) {
char* source = NULL;
int success = read_file("test.txt", &source);
if (success) {
//read_file("test.txt", &source); - a copy paste error, edited as a comment
printf("%s\n", source);
free(source);
}
}
3 Answers 3
There are a number of things I see that may help you improve your code.
Avoid goto
Every programmer should understand when and when not to use goto
. For useful discussion, history and guidance, see this SO question and answers. Generally the problem with goto
is that it obfuscates program flow and can often be more clearly expressed using other control structures. In this particular case, one might consider inverting the logic -- that is, only do the buffer read if previous operations were successful, avoiding the goto
s altogether and making the control flow easier to follow.
Check for malloc
failure
Calls to malloc
can fail, and if they do, a NULL
pointer is returned. Dereferencing a NULL
pointer usually means a program crash, so a well behaved program should always check to make sure that the call to malloc
did not fail before doing anything with the returned pointer.
Use NULL
pointer to indicate failure
Instead of taking a pointer to a pointer as a parameter to read_file
it would be safer, shorter and clearer to instead use the pointer value as a return value. In this case, a NULL
pointer could represent an error. This would also eliminate the problem that occurs if a non-NULL
value is passed in and then a seek
operation fails; currently the program will try to free memory that may never have been allocated.
Separate error reporting from error detection
The read_file
routine might be called from a non-console application, but the error reporting built into it would attempt to write to the (possibly non-existant) console anyway using sprintf
. Better would be to simply report the error to the calling function and allow it to decide whether and how to report the error.
In particular consider the following:
char *read_file(const char *filename) { ... }
int main() {
char *buffer = read_file("myfile.txt");
if (buffer == NULL) { // oops, an error!
perror("read_file()");
return 1;
} else {
// do useful things
}
}
The advantage is that if I want the file name as part of the error message or not, if I want a dialog box or a command line message, if I want a log file entry or an error indicating sound played, or if I want to display the error message in Chinese or German, I can control all of these things from the calling function. If it's buried within a function I call, I haven't got that kind of control. Those are just a few reasons it's useful to separate error reporting from error detection.
Use all required #include
s
This program requires the following lines which should be part of the source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Use snprintf
instead of sprintf
By using snprintf
instead of sprintf
, these lines
char error_msg[1024];
// let's trunc filenames larger than 256, to not overflow error_msg buffer
char safe_filename[256];
if (strlen(filename) > 256) {
memcpy(safe_filename, filename, 256);
}
sprintf(error_msg, "Unable to read file : %s", safe_filename);
perror(error_msg);
can be simplified to this:
char error_msg[1024];
snprintf(error_msg, 1024, "Unable to read file [%s]", filename);
perror(error_msg);
Don't make assumptions about the values of constants
If ftell
fails, it returns EOF
which may or may not be mathematically identical to -1
which is currently used in the code. Instead you should compare it to EOF
so that it is both clear to the reader of your code and perfectly portable.
Use the appropriate variable types
The ftell
routine returns a long
and not a ssize_t
. They may be convertible (or even identical) types, but there is no real reason not to simply declare file_len
as a long
.
Avoid the use of "magic numbers"
The numbers 1024, 256, 1 and -1 appear in the program, and we might be able to guess their signficance, but better practice would be to use a const
or #define
and name those constants with some meaningful name.
-
\$\begingroup\$ Thanks a lot. This is really helpful. In "Separate error reporting from error detection", would taking an extra char** parameter, malloc that and setting that with an error message be the right way to go? \$\endgroup\$dev_nut– dev_nut2015年05月07日 17:27:01 +00:00Commented May 7, 2015 at 17:27
-
1\$\begingroup\$ @dev_nut: It wouldn't need to take another parameter. The signature could be
char *read_file(const char *filename)
. If the function returnsNULL
, you can generate the error from the caller sinceerrno
is still set and the string is only composed of things already known to the caller. \$\endgroup\$Edward– Edward2015年05月07日 18:15:30 +00:00Commented May 7, 2015 at 18:15 -
\$\begingroup\$ Hmmm, wouldn't it be easier for the user of the function to just know what the error is rather than checking errno himself? I would like to directly get any error and any context about the error. \$\endgroup\$dev_nut– dev_nut2015年05月07日 18:39:36 +00:00Commented May 7, 2015 at 18:39
-
\$\begingroup\$ You don't lose context because
errno
is set. I've updated my answer to address that question. \$\endgroup\$Edward– Edward2015年05月07日 18:53:30 +00:00Commented May 7, 2015 at 18:53 -
\$\begingroup\$ I understood more about how to use perror now. Thanks! \$\endgroup\$dev_nut– dev_nut2015年05月08日 04:17:23 +00:00Commented May 8, 2015 at 4:17
This code is not safe, and exhibits undefined behavior. I don't think you've actually tested it.
char safe_filename[256];
if (strlen(filename) > 256) {
memcpy(safe_filename, filename, 256);
}
sprintf(error_msg, "Unable to read file : %s", safe_filename);
Let's see what happens if filename
is short. The test is false, so execution proceeds to the sprintf
. It is constructing a string and is supposed to interpolate safe_filename
... but that hasn't been initialized yet. Whatever garbage is in safe_filename
will be copied into error_msg
until it happens to hit a '0円'
, unless it gets a segfault first. It might even overflow the bounds of error_msg
.
And what happens if filename
is long? The test is true, so 256 characters of filename
get copied into safe_filename
. No terminating '0円'
because memcpy()
doesn't do that (and there wouldn't be room if it did). Execution proceeds to the sprintf
. Again, it starts interpolating from safe_filename
into the string, but the copying continues without stopping until it happens to hit a '0円'
or segfaults. Same risk to error_msg
.
How did this problem arise? You wrote code without testing it. Let me take a minute to pontificate about testing. You need to test your code. You need to test all your code. You need to be able to do the tests in an automated manner. That means your automatic testing harness needs to be able to cause each branch to be either taken or not taken.
-
\$\begingroup\$ Thanks. You are right, this was a last minute adjustment since I noticed a possible overflow in the error string. It was not tested. \$\endgroup\$dev_nut– dev_nut2015年05月07日 03:19:28 +00:00Commented May 7, 2015 at 3:19
Not portable
First of all, I feel that it is nonstandard to use fread()
after you've opened the file in text mode. Usually, you use fread()
when you open a file in binary mode.
That being said, what can now happen is that you can come across a version of fread()
that does something unexpected. For example, Microsoft says this about using their fread()
:
The fread function reads up to count items of size bytes from the input stream and stores them in buffer. The file pointer associated with stream (if there is one) is increased by the number of bytes actually read. If the given stream is opened in text mode, carriage return–linefeed pairs are replaced with single linefeed characters. The replacement has no effect on the file pointer or the return value. The file-pointer position is indeterminate if an error occurs. The value of a partially read item cannot be determined.
(Source: MSDN Library)
Note the part where it says it replaces CRLF pairs with LF only. So after you read in the file contents, when your code does this:
(*buffer)[file_len] = '0円';
You are terminating the string at file_len
, but the actual string could be shorter because fread()
converted all the CRLF pairs to a single linefeed character. So you could end up with a bunch of junk characters past the end of the file.
Use calloc
I discovered a long time ago that using calloc
instead of malloc
saved me from many headaches. In your program, if you used calloc
instead of malloc
, you wouldn't even need to terminate the string. And then it would just work with the Microsoft version of fread()
.
-
\$\begingroup\$ If I use calloc, and still stick to fread() it would solve the issue you pointed out, right? Also, could you give a code snippet for an alternative to fread()? \$\endgroup\$dev_nut– dev_nut2015年05月08日 04:08:15 +00:00Commented May 8, 2015 at 4:08
-
\$\begingroup\$ @dev_nut You can use
fread()
. Just open the file in binary mode. Like this:fp = fopen(filename, "rb");
. Yes calloc would zero out the entire buffer so you don't even have to worry about terminating your string. \$\endgroup\$JS1– JS12015年05月08日 04:10:38 +00:00Commented May 8, 2015 at 4:10 -
\$\begingroup\$ Cool. Writing portable C code does seem to be possible if you try hard. Thanks! \$\endgroup\$dev_nut– dev_nut2015年05月08日 04:18:32 +00:00Commented May 8, 2015 at 4:18