Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

I don't think the program meets the requirements (i.e. "behave rationally no matter how unresonable the input") because it fails if the line size is greater than LINESIZE.

To behave rationally it would perhaps, ideally:

At least detect if a line is "too large" and fail properly
Or, properly handle a huge line (which fits on disk but is too large to fit in memory)

I'd suggest writing 2 versions.

The first version should guess how long each line is (e.g. 200 bytes). If the user asks for 10 lines, it should therefore fseek to the last 2000 bytes of the file, try to parse the last 2000 bytes, and display the result if successful.
The first version should guess how long each line is (e.g. 200 bytes). If the user asks for 10 lines, it should therefore fseek to the last 2000 bytes of the file, try to parse the last 2000 bytes, and display the result if successful.
If it doesn't find the right number of lines in the last 2000 bytes, then either try again with the previous 2000 bytes, or fail over to scanning the entire file.
When it scans the entire file, it should remember the start (byte offset) of the most recent 10 lines, so that when it reaches the end of the file it can print the end of the file from the 10th-most-recent byte offset. I see no need to remember the contents of each line.

If it doesn't find the right number of lines in the last 2000 bytes, then either try again with the previous 2000 bytes, or fail over to scanning the entire file.

When it scans the entire file, it should remember the start (byte offset) of the most recent 10 lines, so that when it reaches the end of the file it can print the end of the file from the 10th-most-recent byte offset. I see no need to remember the contents of each line.

The first is just a speed optimization for normal large files (assuming that fseek is faster than parsing). If it just needs to be correct instead of fast then only implement the 2nd version.

tail usually accepts a filename as an input parameter, and files are usually seekable.

If it's reading from stdin then stdin may not be seekable, in which you case you must buffer the lines which you read (and intend to print) in memory.

I don't fully agree with your decision not to use malloc.

If you want to use your palloc function I think it has too many pointers (surely you want an array of bytes, not an array of byte-pointers), and should be like this instead:

#define BLOCKSIZE 100000
static char sharedBlock[BLOCKSIZE];
static size_t alreadyAllocated;
char *palloc(int size) {
 if(size + alreadyAllocated <= BLOCKSIZE) {
 char* rc = &sharedBlock[alreadyAllocated];
 alreadyAllocated += size;
 return rc;
 }
 else {
 return NULL;
 }
}

I haven't analyzed/inspected your readline.c and queue.c in detail.

I don't think the program meets the requirements (i.e. "behave rationally no matter how unresonable the input") because it fails if the line size is greater than LINESIZE.

To behave rationally it would perhaps, ideally:

At least detect if a line is "too large" and fail properly
Or, properly handle a huge line (which fits on disk but is too large to fit in memory)

I'd suggest writing 2 versions.

The first version should guess how long each line is (e.g. 200 bytes). If the user asks for 10 lines, it should therefore fseek to the last 2000 bytes of the file, try to parse the last 2000 bytes, and display the result if successful.

If it doesn't find the right number of lines in the last 2000 bytes, then either try again with the previous 2000 bytes, or fail over to scanning the entire file.

When it scans the entire file, it should remember the start (byte offset) of the most recent 10 lines, so that when it reaches the end of the file it can print the end of the file from the 10th-most-recent byte offset. I see no need to remember the contents of each line.

The first is just a speed optimization for normal large files (assuming that fseek is faster than parsing). If it just needs to be correct instead of fast then only implement the 2nd version.

tail usually accepts a filename as an input parameter, and files are usually seekable.

If it's reading from stdin then stdin may not be seekable, in which you case you must buffer the lines which you read (and intend to print) in memory.

I don't fully agree with your decision not to use malloc.

If you want to use your palloc function I think it has too many pointers (surely you want an array of bytes, not an array of byte-pointers), and should be like this instead:

#define BLOCKSIZE 100000
static char sharedBlock[BLOCKSIZE];
static size_t alreadyAllocated;
char *palloc(int size) {
 if(size + alreadyAllocated <= BLOCKSIZE) {
 char* rc = &sharedBlock[alreadyAllocated];
 alreadyAllocated += size;
 return rc;
 }
 else {
 return NULL;
 }
}

I haven't analyzed/inspected your readline.c and queue.c in detail.

I don't think the program meets the requirements (i.e. "behave rationally no matter how unresonable the input") because it fails if the line size is greater than LINESIZE.

To behave rationally it would perhaps, ideally:

At least detect if a line is "too large" and fail properly
Or, properly handle a huge line (which fits on disk but is too large to fit in memory)

I'd suggest writing 2 versions.

The first version should guess how long each line is (e.g. 200 bytes). If the user asks for 10 lines, it should therefore fseek to the last 2000 bytes of the file, try to parse the last 2000 bytes, and display the result if successful.
If it doesn't find the right number of lines in the last 2000 bytes, then either try again with the previous 2000 bytes, or fail over to scanning the entire file.
When it scans the entire file, it should remember the start (byte offset) of the most recent 10 lines, so that when it reaches the end of the file it can print the end of the file from the 10th-most-recent byte offset. I see no need to remember the contents of each line.

The first is just a speed optimization for normal large files (assuming that fseek is faster than parsing). If it just needs to be correct instead of fast then only implement the 2nd version.

tail usually accepts a filename as an input parameter, and files are usually seekable.

If it's reading from stdin then stdin may not be seekable, in which you case you must buffer the lines which you read (and intend to print) in memory.

I don't fully agree with your decision not to use malloc.

If you want to use your palloc function I think it has too many pointers (surely you want an array of bytes, not an array of byte-pointers), and should be like this instead:

#define BLOCKSIZE 100000
static char sharedBlock[BLOCKSIZE];
static size_t alreadyAllocated;
char *palloc(int size) {
 if(size + alreadyAllocated <= BLOCKSIZE) {
 char* rc = &sharedBlock[alreadyAllocated];
 alreadyAllocated += size;
 return rc;
 }
 else {
 return NULL;
 }
}

I haven't analyzed/inspected your readline.c and queue.c in detail.

Source Link

answered Mar 1, 2014 at 19:52

ChrisW

answered Mar 1, 2014 at 19:52

ChrisW

13.1k
1
35
76

I don't think the program meets the requirements (i.e. "behave rationally no matter how unresonable the input") because it fails if the line size is greater than LINESIZE.

To behave rationally it would perhaps, ideally:

At least detect if a line is "too large" and fail properly
Or, properly handle a huge line (which fits on disk but is too large to fit in memory)

I'd suggest writing 2 versions.

The first version should guess how long each line is (e.g. 200 bytes). If the user asks for 10 lines, it should therefore fseek to the last 2000 bytes of the file, try to parse the last 2000 bytes, and display the result if successful.

If it doesn't find the right number of lines in the last 2000 bytes, then either try again with the previous 2000 bytes, or fail over to scanning the entire file.

When it scans the entire file, it should remember the start (byte offset) of the most recent 10 lines, so that when it reaches the end of the file it can print the end of the file from the 10th-most-recent byte offset. I see no need to remember the contents of each line.

The first is just a speed optimization for normal large files (assuming that fseek is faster than parsing). If it just needs to be correct instead of fast then only implement the 2nd version.

tail usually accepts a filename as an input parameter, and files are usually seekable.

If it's reading from stdin then stdin may not be seekable, in which you case you must buffer the lines which you read (and intend to print) in memory.

I don't fully agree with your decision not to use malloc.

If you want to use your palloc function I think it has too many pointers (surely you want an array of bytes, not an array of byte-pointers), and should be like this instead:

#define BLOCKSIZE 100000
static char sharedBlock[BLOCKSIZE];
static size_t alreadyAllocated;
char *palloc(int size) {
 if(size + alreadyAllocated <= BLOCKSIZE) {
 char* rc = &sharedBlock[alreadyAllocated];
 alreadyAllocated += size;
 return rc;
 }
 else {
 return NULL;
 }
}

I haven't analyzed/inspected your readline.c and queue.c in detail.

lang-c