Simple and safe file reader for C

Question 1

As an exercise to improve my coding ability in C, I have attempted to write a simple and safe file reader. I believe this should be portable as well to major platforms (correct me if I'm wrong). For simplicity's sake I assume the file will fit in memory which is mostly true for my use case.

Please let me know where I can improve.

(Note: I've used goto because I think it makes the code flow better)

int read_file(const char* filename, char** buffer) {
 FILE* fp = fopen(filename, "r");
 if (!fp) {
 goto f_read_error;
 }
 ssize_t file_len;
 if (fseek(fp, 0L, SEEK_END) == 0) {
 file_len = ftell(fp);
 if (file_len == -1) {
 goto f_read_error;
 }
 if (fseek(fp, 0L, SEEK_SET) != 0) {
 goto f_read_error;
 }
 *buffer = (char*) malloc(sizeof(char) * (file_len + 1));
 int output = fread(*buffer, sizeof(char), file_len, fp);
 if (output != file_len) {
 goto f_read_error;
 };
 (*buffer)[file_len] = '0円';
 } else {
 f_read_error: ;
 char error_msg[1024];
 // let's trunc filenames larger than 256, to not overflow error_msg buffer
 char safe_filename[256];
 if (strlen(filename) > 256) {
 memcpy(safe_filename, filename, 256);
 }
 sprintf(error_msg, "Unable to read file : %s", safe_filename);
 perror(error_msg);
 if (buffer) {
 free(buffer);
 }
 if (fp) {
 fclose(fp);
 }
 return -1;
 }
 if (fp) {
 fclose(fp);
 }
 return 1;
}
int main(int argc, char** argv) {
 char* source = NULL;
 int success = read_file("test.txt", &source);
 if (success) {
 //read_file("test.txt", &source); - a copy paste error, edited as a comment
 printf("%s\n", source);
 free(source);
 }
}

Question 2

There are a number of things I see that may help you improve your code.

Avoid `goto`

Every programmer should understand when and when not to use goto. For useful discussion, history and guidance, see this SO question and answers. Generally the problem with goto is that it obfuscates program flow and can often be more clearly expressed using other control structures. In this particular case, one might consider inverting the logic -- that is, only do the buffer read if previous operations were successful, avoiding the gotos altogether and making the control flow easier to follow.

Check for `malloc` failure

Calls to malloc can fail, and if they do, a NULL pointer is returned. Dereferencing a NULL pointer usually means a program crash, so a well behaved program should always check to make sure that the call to malloc did not fail before doing anything with the returned pointer.

Use `NULL` pointer to indicate failure

Instead of taking a pointer to a pointer as a parameter to read_file it would be safer, shorter and clearer to instead use the pointer value as a return value. In this case, a NULL pointer could represent an error. This would also eliminate the problem that occurs if a non-NULL value is passed in and then a seek operation fails; currently the program will try to free memory that may never have been allocated.

Separate error reporting from error detection

The read_file routine might be called from a non-console application, but the error reporting built into it would attempt to write to the (possibly non-existant) console anyway using sprintf. Better would be to simply report the error to the calling function and allow it to decide whether and how to report the error.

In particular consider the following:

char *read_file(const char *filename) { ... }
int main() {
 char *buffer = read_file("myfile.txt");
 if (buffer == NULL) { // oops, an error!
 perror("read_file()");
 return 1;
 } else {
 // do useful things
 }
}

The advantage is that if I want the file name as part of the error message or not, if I want a dialog box or a command line message, if I want a log file entry or an error indicating sound played, or if I want to display the error message in Chinese or German, I can control all of these things from the calling function. If it's buried within a function I call, I haven't got that kind of control. Those are just a few reasons it's useful to separate error reporting from error detection.

Use all required `#include`s

This program requires the following lines which should be part of the source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Use `snprintf` instead of `sprintf`

By using snprintf instead of sprintf, these lines

char error_msg[1024];
// let's trunc filenames larger than 256, to not overflow error_msg buffer
char safe_filename[256];
if (strlen(filename) > 256) {
memcpy(safe_filename, filename, 256);
}
sprintf(error_msg, "Unable to read file : %s", safe_filename);
perror(error_msg);

can be simplified to this:

char error_msg[1024];
snprintf(error_msg, 1024, "Unable to read file [%s]", filename);
perror(error_msg);

Don't make assumptions about the values of constants

If ftell fails, it returns EOF which may or may not be mathematically identical to -1 which is currently used in the code. Instead you should compare it to EOF so that it is both clear to the reader of your code and perfectly portable.

Use the appropriate variable types

The ftell routine returns a long and not a ssize_t. They may be convertible (or even identical) types, but there is no real reason not to simply declare file_len as a long.

Avoid the use of "magic numbers"

The numbers 1024, 256, 1 and -1 appear in the program, and we might be able to guess their signficance, but better practice would be to use a const or #define and name those constants with some meaningful name.

Question 3

Thanks a lot. This is really helpful. In "Separate error reporting from error detection", would taking an extra char** parameter, malloc that and setting that with an error message be the right way to go?

Question 4

@dev_nut: It wouldn't need to take another parameter. The signature could be char *read_file(const char *filename). If the function returns NULL, you can generate the error from the caller since errno is still set and the string is only composed of things already known to the caller.

Question 5

Hmmm, wouldn't it be easier for the user of the function to just know what the error is rather than checking errno himself? I would like to directly get any error and any context about the error.

Question 6

You don't lose context because errno is set. I've updated my answer to address that question.

Question 7

I understood more about how to use perror now. Thanks!

Question 8

This code is not safe, and exhibits undefined behavior. I don't think you've actually tested it.

 char safe_filename[256];
 if (strlen(filename) > 256) {
 memcpy(safe_filename, filename, 256);
 }
 sprintf(error_msg, "Unable to read file : %s", safe_filename);

Let's see what happens if filename is short. The test is false, so execution proceeds to the sprintf. It is constructing a string and is supposed to interpolate safe_filename... but that hasn't been initialized yet. Whatever garbage is in safe_filename will be copied into error_msg until it happens to hit a '0円', unless it gets a segfault first. It might even overflow the bounds of error_msg.

And what happens if filename is long? The test is true, so 256 characters of filename get copied into safe_filename. No terminating '0円' because memcpy() doesn't do that (and there wouldn't be room if it did). Execution proceeds to the sprintf. Again, it starts interpolating from safe_filename into the string, but the copying continues without stopping until it happens to hit a '0円' or segfaults. Same risk to error_msg.

How did this problem arise? You wrote code without testing it. Let me take a minute to pontificate about testing. You need to test your code. You need to test all your code. You need to be able to do the tests in an automated manner. That means your automatic testing harness needs to be able to cause each branch to be either taken or not taken.

Question 9

Thanks. You are right, this was a last minute adjustment since I noticed a possible overflow in the error string. It was not tested.

Question 10

Not portable

First of all, I feel that it is nonstandard to use fread() after you've opened the file in text mode. Usually, you use fread() when you open a file in binary mode.

That being said, what can now happen is that you can come across a version of fread() that does something unexpected. For example, Microsoft says this about using their fread():

The fread function reads up to count items of size bytes from the input stream and stores them in buffer. The file pointer associated with stream (if there is one) is increased by the number of bytes actually read. If the given stream is opened in text mode, carriage return–linefeed pairs are replaced with single linefeed characters. The replacement has no effect on the file pointer or the return value. The file-pointer position is indeterminate if an error occurs. The value of a partially read item cannot be determined.

(Source: MSDN Library)

Note the part where it says it replaces CRLF pairs with LF only. So after you read in the file contents, when your code does this:

(*buffer)[file_len] = '0円';

You are terminating the string at file_len, but the actual string could be shorter because fread() converted all the CRLF pairs to a single linefeed character. So you could end up with a bunch of junk characters past the end of the file.

Use calloc

I discovered a long time ago that using calloc instead of malloc saved me from many headaches. In your program, if you used calloc instead of malloc, you wouldn't even need to terminate the string. And then it would just work with the Microsoft version of fread().

Question 11

If I use calloc, and still stick to fread() it would solve the issue you pointed out, right? Also, could you give a code snippet for an alternative to fread()?

Question 12

@dev_nut You can use fread(). Just open the file in binary mode. Like this: fp = fopen(filename, "rb");. Yes calloc would zero out the entire buffer so you don't even have to worry about terminating your string.

Question 13

Cool. Writing portable C code does seem to be possible if you try hard. Thanks!

Edward Edward 67.2k4 gold badges120 silver badges284 bronze badges · Accepted Answer · 2015-05-07 15:38:04Z

There are a number of things I see that may help you improve your code.

Avoid `goto`

Every programmer should understand when and when not to use goto. For useful discussion, history and guidance, see this SO question and answers. Generally the problem with goto is that it obfuscates program flow and can often be more clearly expressed using other control structures. In this particular case, one might consider inverting the logic -- that is, only do the buffer read if previous operations were successful, avoiding the gotos altogether and making the control flow easier to follow.

Check for `malloc` failure

Calls to malloc can fail, and if they do, a NULL pointer is returned. Dereferencing a NULL pointer usually means a program crash, so a well behaved program should always check to make sure that the call to malloc did not fail before doing anything with the returned pointer.

Use `NULL` pointer to indicate failure

Instead of taking a pointer to a pointer as a parameter to read_file it would be safer, shorter and clearer to instead use the pointer value as a return value. In this case, a NULL pointer could represent an error. This would also eliminate the problem that occurs if a non-NULL value is passed in and then a seek operation fails; currently the program will try to free memory that may never have been allocated.

Separate error reporting from error detection

The read_file routine might be called from a non-console application, but the error reporting built into it would attempt to write to the (possibly non-existant) console anyway using sprintf. Better would be to simply report the error to the calling function and allow it to decide whether and how to report the error.

In particular consider the following:

char *read_file(const char *filename) { ... }
int main() {
 char *buffer = read_file("myfile.txt");
 if (buffer == NULL) { // oops, an error!
 perror("read_file()");
 return 1;
 } else {
 // do useful things
 }
}

The advantage is that if I want the file name as part of the error message or not, if I want a dialog box or a command line message, if I want a log file entry or an error indicating sound played, or if I want to display the error message in Chinese or German, I can control all of these things from the calling function. If it's buried within a function I call, I haven't got that kind of control. Those are just a few reasons it's useful to separate error reporting from error detection.

Use all required `#include`s

This program requires the following lines which should be part of the source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Use `snprintf` instead of `sprintf`

By using snprintf instead of sprintf, these lines

char error_msg[1024];
// let's trunc filenames larger than 256, to not overflow error_msg buffer
char safe_filename[256];
if (strlen(filename) > 256) {
memcpy(safe_filename, filename, 256);
}
sprintf(error_msg, "Unable to read file : %s", safe_filename);
perror(error_msg);

can be simplified to this:

char error_msg[1024];
snprintf(error_msg, 1024, "Unable to read file [%s]", filename);
perror(error_msg);

Don't make assumptions about the values of constants

If ftell fails, it returns EOF which may or may not be mathematically identical to -1 which is currently used in the code. Instead you should compare it to EOF so that it is both clear to the reader of your code and perfectly portable.

Use the appropriate variable types

The ftell routine returns a long and not a ssize_t. They may be convertible (or even identical) types, but there is no real reason not to simply declare file_len as a long.

Avoid the use of "magic numbers"

The numbers 1024, 256, 1 and -1 appear in the program, and we might be able to guess their signficance, but better practice would be to use a const or #define and name those constants with some meaningful name.

Thanks a lot. This is really helpful. In "Separate error reporting from error detection", would taking an extra char** parameter, malloc that and setting that with an error message be the right way to go?
@dev_nut: It wouldn't need to take another parameter. The signature could be char *read_file(const char *filename). If the function returns NULL, you can generate the error from the caller since errno is still set and the string is only composed of things already known to the caller.
Hmmm, wouldn't it be easier for the user of the function to just know what the error is rather than checking errno himself? I would like to directly get any error and any context about the error.
You don't lose context because errno is set. I've updated my answer to address that question.

Stack Exchange Network

Simple and safe file reader for C

3 Answers 3

Avoid `goto`

Check for `malloc` failure

Use `NULL` pointer to indicate failure

Separate error reporting from error detection

Use all required `#include`s

Use `snprintf` instead of `sprintf`

Don't make assumptions about the values of constants

Use the appropriate variable types

Avoid the use of "magic numbers"

Not portable

Use calloc

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple and safe file reader for C

3 Answers 3

Avoid goto

Check for malloc failure

Use NULL pointer to indicate failure

Separate error reporting from error detection

Use all required #includes

Use snprintf instead of sprintf

Don't make assumptions about the values of constants

Use the appropriate variable types

Avoid the use of "magic numbers"

Not portable

Use calloc

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Avoid `goto`

Check for `malloc` failure

Use `NULL` pointer to indicate failure

Use all required `#include`s

Use `snprintf` instead of `sprintf`