Because I use the task so often, I wrote a C header for reading a file. I never formally learned C, so I'd really appreciate a code review on the code.
As a matter of personal taste, I do not put a space between the final parenthesis and an opening brace (){
).
readfile.h
#ifndef _READ_FILE_INCL
#define _READ_FILE_INCL
#include <stdio.h> /* for FILE, fopen, getc, fclose, EOF */
#include <stdlib.h> /* for malloc, realloc */
// the amount of bytes the file is initially guessed to be
#ifndef _READFILE_GUESS
#define _READFILE_GUESS 256
#endif
/*
* function `read_file`
* arguments: a file name (char*)
* returns: the contents of the file (char*) or NULL if:
* 1. the file does not exist
* 2. memory allocation failed
* return value must be freed
*/
char* read_file(char* name){
FILE* file;
file = fopen(name, "r");
// 1. the file does not exist: return null
if(!file)
return NULL;
// allocate memory according to the initial guess
// (+ 1) for trailing null byte
char* result = malloc(sizeof(char) * _READFILE_GUESS + 1);
// 2. memory allocation failed
if(result == NULL)
return NULL;
// the position in the string to write the character to
size_t pos = 0;
// the amount of memory allocated for the string
size_t capacity = _READFILE_GUESS;
// character to hold the currently-read character from the file
char ch;
// until the character is the EOF character
while((ch = getc(file)) != EOF){
// update the character at position `pos` to `ch`
result[pos++] = ch;
// if the next position would exceed bounds
if(pos >= capacity){
// add the guess to the capacity
capacity += _READFILE_GUESS;
// allocate memory accordingly
// (+ 1) for trailing null byte
result = realloc(result, sizeof(char) * capacity + 1);
// 2. memory allocation failed
if(result == NULL)
return NULL;
}
}
// close the file
fclose(file);
// remove extra memory
result = realloc(result, sizeof(char) * pos);
// 2. memory allocation failed
if(result == NULL)
return NULL;
// `pos` now points to the index after the last character read
result[pos] = '0円';
return result;
}
#undef _READFILE_GUESS
/*
* MIT License
*
* Copyright (c) 2017 Conor O'Brien
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#endif
-
\$\begingroup\$ Just curious, the code is valid, but it more common to put an extern declaration of the function in a header file and then have a C file that contains the actual function? \$\endgroup\$pacmaninbw– pacmaninbw ♦2017年05月16日 13:53:26 +00:00Commented May 16, 2017 at 13:53
-
\$\begingroup\$ @pacmaninbw As I said, I never have formally learned C. FWIW I don't know what an "extern deceleration" is \$\endgroup\$Conor O'Brien– Conor O'Brien2017年05月16日 14:04:07 +00:00Commented May 16, 2017 at 14:04
2 Answers 2
Getting Size of File
You can get the size of file (without asking the filesystem) by using fseek()
and ftell()
.
if (fseeko(file, 0, SEEK_END) != 0) {
// Error State do something
}
off_t sizeOfFile = ftello(file);
if (fseeko(file, 0, SEEK_SET) != 0) {
// Error State do something
}
// Your file is now ready to be read and the size of the file is
// in `sizeOfFile`
Identifier Names
The identifier _READFILE_GUESS
is reserved for the implementation. Don't use _
(underscore) as the first character in an identifier name.
Realloc
This is the wrong way to use realloc
:
result = realloc(result, sizeof(char) * capacity + 1);
As you have noticed, when realloc
fails it returns NULL
. But it does not free the result
value in this case. This means you have overwritten the value of result
with null and thus leaked the memory it was pointing at.
char* newArea = realloc(result, sizeof(char) * capacity + 1);
if (newArea == NULL) {
// Error condition handle it
// You should probably release `result` by calling `free()`
// before returning result
}
result = newArea;
Reading
Why are you only reading one byte at a time?
while((ch = getc(file)) != EOF)
Why not (attempt to) read capacity
bytes at a time?
size_t blocksRead= 0;
size_t totalRead = 0;
while(1) {
size_t read = fread(result + (blocksRead * capacity) + totalRead, 1, capacity - totalRead, file);
if (ferror(file) != 0) {
// Error State Deal with it.
}
totalRead += read;
if (feof(file)) {
break;
}
if (totalRead == capacity) {
++blocksRead;
totalRead = 0;
// Handle resize of block if you need too.
}
}
License
Also note that by pasting the code into any Stack Exchange site you are already dual licensing your code under the CC license (see the bottom of the page for details).
So though your MIT License is valid it is not the only license this code is now available under.
-
\$\begingroup\$ I remember reading somewhere that only double underscores (
__
) were reserved for implementation usage. \$\endgroup\$Conor O'Brien– Conor O'Brien2017年05月16日 00:55:26 +00:00Commented May 16, 2017 at 0:55 -
\$\begingroup\$ @CᴏɴᴏʀO'Bʀɪᴇɴ Yes those are reserved in any scope. Also 1) A leading underscore followed by any letter in a file scope. 2) A leading underscore followed by an uppercase letter in any scope. From the C standard
7.1.3 Reserved identifiers
. \$\endgroup\$Loki Astari– Loki Astari2017年05月16日 01:04:46 +00:00Commented May 16, 2017 at 1:04 -
1\$\begingroup\$ @CᴏɴᴏʀO'Bʀɪᴇɴ Sure. But then you are breaking the tradition of making macros all upper case (which is also important). Better to follow the standard convention of not using the underscore at the beginning of an identifier. \$\endgroup\$Loki Astari– Loki Astari2017年05月16日 01:11:38 +00:00Commented May 16, 2017 at 1:11
-
2\$\begingroup\$ "Without asking the filesystem"? That's exactly what you're doing with
fseek()
. I think you meant "without guessing". \$\endgroup\$Toby Speight– Toby Speight2017年05月16日 08:15:55 +00:00Commented May 16, 2017 at 8:15 -
1\$\begingroup\$ @CᴏɴᴏʀO'Bʀɪᴇɴ Yes. man7.org/linux/man-pages/man3/fseeko.3.html \$\endgroup\$Loki Astari– Loki Astari2017年06月07日 08:37:06 +00:00Commented Jun 7, 2017 at 8:37
Do not put functions in header files.
Headers are the public interface to your module, so they should contain the declarations of the public functions and (possible) variables. The C source file is the private part of the module that actually contains the implementation.
So, readfile.h
#ifndef _READ_FILE_INCL
#define _READ_FILE_INCL
/*
* function `read_file`
* arguments: a file name (char*) ...
*/
extern char* read_file(char* name);
#endif
Then the implementation, readfile.c
#include <stdio.h> /* for FILE, fopen, getc, fclose, EOF */
#include <stdlib.h> /* for malloc, realloc */
#include "readfile.h"
/* this isn't required outside this module, so we can define it here */
#define READFILE_GUESS 256
char* read_file(char* name){
/* implementation */
}
Then, with your main program in myprog.c
, compile them all together with something like this:
gcc -Wall $FLAGS -o myprog myprog.c readfile.c
As for the code itself, I won't repeat what @Loki stated.
But, you're not handling errors fully, a memory allocation error will leave the file opened:
file = fopen(name, "r");
if(!file)
return NULL;
char* result = malloc(sizeof(char) * _READFILE_GUESS + 1);
if(result == NULL)
return NULL; /* Here, the function returns with 'file' open */
(Granted, that is not likely to matter much, since if malloc
fails your program might not be able to continue anyway. But still.)
Make sure to fclose(file)
on every error exit after opening the file, either in every single place where you do an early return
, or by collecting the cleanup code and return
to a single place.
-
\$\begingroup\$ What about headers like
stdlib.h
? Those define functions, don't they? \$\endgroup\$Conor O'Brien– Conor O'Brien2017年05月16日 20:46:02 +00:00Commented May 16, 2017 at 20:46 -
1\$\begingroup\$ @CᴏɴᴏʀO'Bʀɪᴇɴ, declare, not define The
stdlib.h
on my system has a bunch of typedefs, #defines, and function declarations likeextern double atof (const char *__nptr);
(with some macros sprinkled in, most likely related to compiler-specific notes) \$\endgroup\$ilkkachu– ilkkachu2017年05月17日 09:31:06 +00:00Commented May 17, 2017 at 9:31 -
\$\begingroup\$ @CᴏɴᴏʀO'Bʀɪᴇɴ Its a rule of thumb. By placing function definitions in the header file you are making the file messier and harder to read (the standard library is well documented so we don't care if it is messy). Personally I function longer than a single line I place in the source file (unless I have a very good reason not to). \$\endgroup\$Loki Astari– Loki Astari2017年05月27日 00:32:58 +00:00Commented May 27, 2017 at 0:32