3
\$\begingroup\$

I recently learned that sometimes, when running code, the compiler(or whatever it is) ignores when there isn't enough memory for say, a string, and interacts with the string normally.

Yes, I know this is a "lucky" thing that shouldn't happen, but sometimes does.

I often find trouble when I'm trying to interact with the contents of file, as I often run out of allocated memory and I have trouble dynamically re-allocating.

To fix this, I created two functions:

  • fsizeof_full <- returns the full size of the file(as if it were a string)
  • fsizeof_buffer <- returns the size of the largest line in the file

So far, as I have been using these, I have not come across any problems involving segmentation faults, but I am not sure if I am just getting lucky.

Here is my code(these both have stdio.h included):

fsizeof_full

unsigned int fsizeof_full(FILE *fp) {
 unsigned int size;
 while(getc(fp) != EOF)
 ++size;
 return size;
}

fsizeof_buffer

int fsizeof_buffer(FILE *fp) {
 int buff = 0; // holds the largest size of a line
 int temp = 0; // holds the size of the current line
 int c;
 while((c = getc(fp)) != EOF) {
 ++temp;
 if(c == '\n') {
 buff = (temp > buff)?(temp):(buff);
 temp = 0;
 }
 }
 return buff;

Along with the above question("So far, as I have..."), could you possibly include more thoughts about the code regarding efficiency, necessity, etc.

asked Dec 13, 2014 at 15:38
\$\endgroup\$

3 Answers 3

4
\$\begingroup\$

You can use fseek and ftell to calculate the file size for you without having to read every character:

long fsize(FILE *fp) {
 fseek(fp, 0, SEEK_END);
 long bytes = ftell(fp);
 rewind(fp);
 return bytes;
}

Note that you should open the file in binary mode in order to compute this with fseek. Further, note that this is portable, as all of these functions are part of standard C (stdio.h specifically), while fstat requires a POSIX-based system.

All this having been said, it's not really necessary. When reading a file, you should define a buffer of a specific size, and use a function like fgets, which takes a maximum number of characters to read. For example:

FILE* fp;
// Open file, checking for errors, etc.
char buffer[4096];
while(fgets(buffer, sizeof(buffer), fp) != NULL) {
 ...
}
answered Dec 13, 2014 at 16:34
\$\endgroup\$
4
  • \$\begingroup\$ fstat has a windows equivalent called _fstat \$\endgroup\$ Commented Dec 13, 2014 at 17:17
  • \$\begingroup\$ Suggest 1) delete rewind(fp); out as going back to the beginning is not necessarily where fp was when the function started or 2) perform an ftell() up front and restore the position. \$\endgroup\$ Commented Dec 20, 2014 at 4:10
  • \$\begingroup\$ I'm really surprised( read: disappointed ) to find advice suggesting fseek() and SEEK_END on code review. Do not use fseek() and ftell() to compute the size of a regular file \$\endgroup\$ Commented Jun 23, 2015 at 15:30
  • \$\begingroup\$ This is plain undefined behavior. \$\endgroup\$ Commented Feb 23, 2024 at 15:50
3
\$\begingroup\$

For the first function you can address the OS to find out for you:

unsigned long long fsizeof_full(FILE *fp) {
 struct stat size;
 fstat(fileno(fp), &size);
 return size.st_size;
}

You'll need to include the following headers for it to work.

#include <sys/types.h>
#include <sys/stat.h>

Or you can simply seek to the end of the file (with fseek(fp, 0, SEEK_END)) and use ftell

Besides that files nowadays can easily exceed 4 gigabytes which will overflow the int, both posix and windows have (differently named) variants on ftell that returns a 64 bit number.

answered Dec 13, 2014 at 16:17
\$\endgroup\$
2
  • \$\begingroup\$ Does not st_size use type off_t, some signed integer, not necessarily int? \$\endgroup\$ Commented Dec 20, 2014 at 4:04
  • \$\begingroup\$ The fseek()/ftell() method invokes undefined behavior for a binary stream. See: wiki.sei.cmu.edu/confluence/display/c/… \$\endgroup\$ Commented Feb 23, 2024 at 15:51
2
\$\begingroup\$
  1. fsizeof_full() code fails as it does not initialize size.

  2. The size of a file and the range of unsigned are unrelated. Assume file sizes could be much larger.

    // unsigned int fsizeof_full(FILE *fp) {
    unsigned long long fsizeof_full(FILE *fp) {
     // unsigned int size;
     unsigned long long size = 0;
     while(getc(fp) != EOF)
     ++size;
     return size;
    }
    
  3. fsizeof_buffer(): Use meaningful variable names.

  4. Since the value returned is likely for an array, use size_t.

  5. fsizeof_buffer() has incorrect functionality. If the file consisted of only "1234567890", the function would return 0 rather than 10. Code needs to consider that the last line might not end in '\n'.

    size_t fsizeof_buffer(FILE *fp) {
     size_t largest_line = 0;
     size_t current_line = 0;
     while((c = getc(fp)) != EOF) {
     current_line++;
     if (c == '\n') {
     largest_line = (current_line > largest_line) ? current_line : largest_line;
     current_line = 0;
     }
     }
     largest_line = (current_line > largest_line) ? current_line : largest_line;
     return largest_line;
    }
    
answered Dec 20, 2014 at 3:30
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.