I have written a reimplementation of cat that will be compliant with most scripts, and that is as small and fast as possible. Can anybody suggest some size improvements for it?
#include <stdio.h> /* we need printf, fprintf, FILE,
* fopen, fclose and that sort of
* stuff */
#include <stdlib.h> /* for exit, EXIT_SUCCESS, and
* EXIT_FAILURE */
int main(int argc, char *argv[])
{
FILE *fp; /* file to cat */
short int ch; /* character to print */
unsigned short int cp; /* currently printing argv[?] */
unsigned short int failed = 0; /* how many failed to print */
if (argc < 2) { /* if not enough args act as pipe */
for (;;) { /* infinite loop */
if ((ch = fgetc(stdin)) == EOF) {
exit(EXIT_SUCCESS);
} else {
printf("%c", ch);
}
}
}
for (cp = 1; cp <= argc - 1; cp++) { /* for each file in cli args */
fp = fopen(argv[cp], "r"); /* open the file */
if(fp == NULL) {
fprintf(stderr, "%s: cannot open %s\n", argv[0], argv[cp]);
failed++; /* increment the number of failed */
continue; /* go back to beginning */
}
while ((ch = fgetc(fp)) != EOF) {
printf("%c", ch);
}
fclose(fp); /* close the file */
}
exit(failed); /* exit with the number of files */
} /* that failed */
I'm sorry that the indentation is messed up, I use tabs in my code.
In clang version 3.5.0 on Ubuntu the generated binary size is 5200 bytes with these options:
-Oz -fomit-frame-pointer -s
Improvements
I added some more error checking, added -u and - and also wrote a function that writes out a file to avoid repeating myself. Also changed short int to int, printf to putchar, and some other things.
#include <stdio.h> /* we need printf, fprintf, FILE,
* fopen, fclose and that sort of
* stuff */
#include <stdlib.h> /* for exit, EXIT_SUCCESS, and
* EXIT_FAILURE */
#include <string.h> /* need strcmp */
void write_file(char *file, char *progname) /* function to print individual file */
{
FILE *fp; /* file to write */
int ch; /* temporary variable */
if (strcmp(file, "-") == 0) { /* check if using stdin */
fp = stdin; /* use standard input */
} else {
fp = fopen(file, "r"); /* open file for writing */
}
if (fp == NULL) { /* error in opening file */
fprintf(stderr, "%s: cannot open %s\n", progname, file);
exit(EXIT_FAILURE);
}
while ((ch = fgetc(fp)) != EOF) {
if (putchar(ch) == EOF) {
fprintf(stderr, "%s: can't read %s\n", progname, file);
exit(EXIT_FAILURE);
}
}
if (ferror(fp) != 0) { /* IO error, perhaps? */
fprintf(stderr, "%s: error reading %s\n", progname, file);
exit(EXIT_FAILURE);
}
if (strcmp(file, "-") != 0) { /* close the file unless stdin */
fclose(fp);
}
return;
}
int main(int argc, char *argv[])
{
unsigned short int cp = 1; /* currently printing argv[?] */
if (argc < 2) { /* act as pipe */
write_file("-", argv[0]);
exit(EXIT_SUCCESS);
}
if (strcmp(argv[1], "-u") == 0) { /* support -u option for SUS compat */
cp++; /* increment the argv pointer var */
}
for (; cp <= argc - 1; cp++) { /* for each file in cli args */
write_file(argv[cp], argv[0]); /* print the file */
}
exit(EXIT_SUCCESS);
}
2 Answers 2
Do not repeat yourself. Copying
stdin
doesn't differ from copying any other file. There is no need to special casestdin
as you do at lines 15-21.Avoid naked loops. Every loop does an important job that deserves a name. This is strong enough reason to factor it out as a function:
Expect errors
fgetc
may fail. Oncefgetc
returnsEOF
, callfeof
orferror
to see what has actually happened.printf
may fail. Make sure to detect (printf
returns -1) and report (analyzeerrno
) errors.
Handle options. At least recognize a standalone
-
as a fake file name referring to stdin.
May be small, but isn't fast
Your program suffers from operating on a one character basis. There is just too much overhead when reading and writing single characters, even if your file I/O is fully buffered.
I ran your program on a 100 MB file. It took 30 seconds. Meanwhile, the normal cat program finished in 0.07 seconds.
I modified your program to use fread
into a 32KB buffer and then fwrite
the buffer to stdout
. This make your program run in 0.07 seconds.
Text mode
Actually I would avoid use FILE
functions and just use open
, read
, and write
. The reason is that stdout
defaults to being open in text mode. On a Linux system that wouldn't matter, but on a Windows system that could lead to the FILE
functions doing strange things with the newline characters. There isn't a portable way of changing the mode of stdout
to binary either.
-
\$\begingroup\$ It's funny that you think using a buffer increases memory usage when using the
FILE
functions will already create a hidden buffer for you. Anyways, you don't have to make the buffer 32K - it can be as small as you want it to be. As far as the portability ofopen
vsfopen
, you're right thatfopen
is more portable. Personally I've never seen a system where there wasn'topen
but maybe you know of an embedded system like that. \$\endgroup\$JS1– JS12015年05月11日 14:42:50 +00:00Commented May 11, 2015 at 14:42
-
and-u
arguments; you could start by reading the POSIX spec. "fast and small", does that mean with or without linked libraries? \$\endgroup\$-
would be pretty necessary though. Are you stuck with an ancient compiler? \$\endgroup\$printf("%c", ch);
->fputc(ch, fp);
orputchar(ch);
\$\endgroup\$short int ch;
shoud beint ch;
EOF
is not guaranteed to fit in ashort
\$\endgroup\$