I recently learned that sometimes, when running code, the compiler(or whatever it is) ignores when there isn't enough memory for say, a string, and interacts with the string normally.
Yes, I know this is a "lucky" thing that shouldn't happen, but sometimes does.
I often find trouble when I'm trying to interact with the contents of file, as I often run out of allocated memory and I have trouble dynamically re-allocating.
To fix this, I created two functions:
- fsizeof_full <- returns the full size of the file(as if it were a string)
- fsizeof_buffer <- returns the size of the largest line in the file
So far, as I have been using these, I have not come across any problems involving segmentation faults, but I am not sure if I am just getting lucky.
Here is my code(these both have stdio.h
included):
fsizeof_full
unsigned int fsizeof_full(FILE *fp) {
unsigned int size;
while(getc(fp) != EOF)
++size;
return size;
}
fsizeof_buffer
int fsizeof_buffer(FILE *fp) {
int buff = 0; // holds the largest size of a line
int temp = 0; // holds the size of the current line
int c;
while((c = getc(fp)) != EOF) {
++temp;
if(c == '\n') {
buff = (temp > buff)?(temp):(buff);
temp = 0;
}
}
return buff;
Along with the above question("So far, as I have..."), could you possibly include more thoughts about the code regarding efficiency, necessity, etc.
3 Answers 3
You can use fseek
and ftell
to calculate the file size for you without having to read every character:
long fsize(FILE *fp) {
fseek(fp, 0, SEEK_END);
long bytes = ftell(fp);
rewind(fp);
return bytes;
}
Note that you should open the file in binary mode in order to compute this with fseek
. Further, note that this is portable, as all of these functions are part of standard C (stdio.h
specifically), while fstat
requires a POSIX-based system.
All this having been said, it's not really necessary. When reading a file, you should define a buffer of a specific size, and use a function like fgets
, which takes a maximum number of characters to read. For example:
FILE* fp;
// Open file, checking for errors, etc.
char buffer[4096];
while(fgets(buffer, sizeof(buffer), fp) != NULL) {
...
}
-
\$\begingroup\$ fstat has a windows equivalent called _fstat \$\endgroup\$ratchet freak– ratchet freak2014年12月13日 17:17:05 +00:00Commented Dec 13, 2014 at 17:17
-
\$\begingroup\$ Suggest 1) delete
rewind(fp);
out as going back to the beginning is not necessarily wherefp
was when the function started or 2) perform anftell()
up front and restore the position. \$\endgroup\$chux– chux2014年12月20日 04:10:56 +00:00Commented Dec 20, 2014 at 4:10 -
\$\begingroup\$ I'm really surprised( read: disappointed ) to find advice suggesting fseek() and SEEK_END on code review. Do not use fseek() and ftell() to compute the size of a regular file \$\endgroup\$this– this2015年06月23日 15:30:00 +00:00Commented Jun 23, 2015 at 15:30
-
\$\begingroup\$ This is plain undefined behavior. \$\endgroup\$Madagascar– Madagascar2024年02月23日 15:50:58 +00:00Commented Feb 23, 2024 at 15:50
For the first function you can address the OS to find out for you:
unsigned long long fsizeof_full(FILE *fp) {
struct stat size;
fstat(fileno(fp), &size);
return size.st_size;
}
You'll need to include the following headers for it to work.
#include <sys/types.h>
#include <sys/stat.h>
Or you can simply seek to the end of the file (with fseek(fp, 0, SEEK_END)
) and use ftell
Besides that files nowadays can easily exceed 4 gigabytes which will overflow the int, both posix and windows have (differently named) variants on ftell that returns a 64 bit number.
-
\$\begingroup\$ Does not
st_size
use typeoff_t
, some signed integer, not necessarilyint
? \$\endgroup\$chux– chux2014年12月20日 04:04:17 +00:00Commented Dec 20, 2014 at 4:04 -
\$\begingroup\$ The
fseek()
/ftell()
method invokes undefined behavior for a binary stream. See: wiki.sei.cmu.edu/confluence/display/c/… \$\endgroup\$Madagascar– Madagascar2024年02月23日 15:51:31 +00:00Commented Feb 23, 2024 at 15:51
fsizeof_full()
code fails as it does not initializesize
.The size of a file and the range of
unsigned
are unrelated. Assume file sizes could be much larger.// unsigned int fsizeof_full(FILE *fp) { unsigned long long fsizeof_full(FILE *fp) { // unsigned int size; unsigned long long size = 0; while(getc(fp) != EOF) ++size; return size; }
fsizeof_buffer()
: Use meaningful variable names.Since the value returned is likely for an array, use
size_t
.fsizeof_buffer()
has incorrect functionality. If the file consisted of only "1234567890", the function would return 0 rather than 10. Code needs to consider that the last line might not end in'\n'
.size_t fsizeof_buffer(FILE *fp) { size_t largest_line = 0; size_t current_line = 0; while((c = getc(fp)) != EOF) { current_line++; if (c == '\n') { largest_line = (current_line > largest_line) ? current_line : largest_line; current_line = 0; } } largest_line = (current_line > largest_line) ? current_line : largest_line; return largest_line; }