I come from Python, with a bit a FORTRAN and Matlab some years ago. I'm trying to learn C. My first attempt at a program is one to merge all the files in a directory with a specified extension
/*
MERGE: Merges text files. Gives the ability to all files in a
directory with specified extension.
*/
#include <stdio.h>
#include <dirent.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
/* Function prototypes */
int append_to_file(const char *filename, const char *outfilename); // Appends the contents of filename to outfilename
int scandir(char dirname[], char const *ext, char outfile[]); // Scans a directory for files of a specific extension and merges them
bool has_extension(char const *name, char const *ext);
void usage(); // Prints out usage information (help) to the console
void path_combine(char *dest, const char *path1, const char *path2); // Combines a directory name and filename to a single filepath
int main(int argc, char *argv[])
{
int nfiles; // Number of files merged
if (argc == 4)
{
nfiles = scandir(argv[1], argv[2], argv[3]);
printf("Merged %d files\n", nfiles);
return 0;
}
else
{
printf("Wrong input, quitting");
return 1;
}
}
int append_to_file(const char *filename, const char *outfilename)
{
FILE *infile, *outfile;
char ch;
infile = fopen(filename, "r");
outfile = fopen(outfilename, "a");
if (infile == NULL)
{
printf("Input file is empty, skipping...\n");
return 1;
}
while ((ch = fgetc(infile)) != EOF)
fputc(ch, outfile);
printf("Appended file\n");
fclose(infile);
fclose(outfile);
return 0;
}
int scandir(char dirname[], char const *ext, char outfile[])
/* Scans a directory and merges all files of given extension */
{
DIR *d = NULL;
struct dirent *dir = NULL;
char filepath[strlen(dirname) + 255];
int i = 0;
d = opendir(dirname);
if (d)
{
while ((dir = readdir(d)) != NULL)
{
if (has_extension(dir->d_name, ext))
{
path_combine(filepath, dirname, dir->d_name);
printf("%s\n", filepath);
append_to_file(filepath, outfile);
i++;
}
}
closedir(d);
}
return i;
}
bool has_extension(char const *name, char const *ext)
{
size_t len = strlen(name);
return len > 4 && strcmp(name+len-4, ext) == 0;
}
void path_combine(char *dest, const char *path1, const char *path2)
{
const char *last_char = path1;
int append_sep = 0;
char sep[] = "/";
#ifdef WIN32
sep[0] = '\\';
#endif
/* Find the last character in the first path*/
while(*last_char != '0円')
last_char++;
/* If the last character is not a seperator, we must add it*/
if (strcmp(last_char, sep) !=0)
{
append_sep = 1;
}
strcpy(dest, path1);
if (append_sep)
strcat(dest, sep);
strcat(dest, path2);
}
void usage()
{
printf("\t=================\n");
printf("\t MERGE\n");
printf("\t=================\n");
printf("Merge two or more text files\n");
printf("Usage:\n");
printf("\tCall merge with a directory name and desired extension:\n");
printf("\tmerge DIRNAME .csv OUTPUTFILE\n\n");
};
I realise there are many things I haven't done here (such as any real error handling), but bear in mind this is a first attempt as part of the learning process. Rather than being told to 'go look up xx' where xx is a whole area of learning, I'm more interested in specifics if possible.
Are there any obvious code smells? Is there anything that could obviously be implemented better? Particularly usage of pointers, which I am still getting to grips with.
2 Answers 2
scandir
It is not a good name for 2 reasons. First, it exists in a global namespace, and second, it doesn't tell what the function does. Your
scandir
not only scans the directory, but does an actual merge as well. I would recommend to scan the directory with a standard library'sscandir
and then merge files (from the list it returns) in amerge
function.A perk benefit of using a stock
scandir
is an ability to control the merge order. Right now it is unpredictable.Yet another benefit of separating responsibilities is that if outfile is in the same directory with source files and it has an target extension, the results could be very strange.
has_extension presumes that the extension is always 3 characters. It is not necessarily so. It should calculate an actual length of the
ext
argument.append_to_file produces a misleading diagnostics. The
fopen
may fail for many reasons, but file being empty is not one of them. Display the error message withperror
orstrerror
.path_combine
Contrary to a popular belief, windows understands a forward slash as a path separator (backslash is an artifact of
cmd.exe
). There's no need to special case windows.It is also not necessary to test for last character for being a separator. Just add the separator anyway, it is harmless.
That said, I don't think that you need it at all: just
chdir()
to the source directory first.
-
\$\begingroup\$ Thanks that is very helpful. One comment on using the std library 'scandir' (I actually didn't come across that function when reading about the std lib?): using scandir then passing the resulting list to a merge function would introduce an extra loop. One to scan the directory then another to go through the list of names. If this would make no real difference (compared to writing the file) then I'm for it... \$\endgroup\$jramm– jramm2014年10月16日 20:09:00 +00:00Commented Oct 16, 2014 at 20:09
-
\$\begingroup\$ @jramm No difference in execution whatsoever. A bit more memory (for a list of filenames) would be required. Don't forget to
free()
each filename and thenfree()
the list itself. Good luck. \$\endgroup\$vnp– vnp2014年10月16日 20:13:05 +00:00Commented Oct 16, 2014 at 20:13
Your code is nice. Easily followed, well indented and well laid out. There is some extra whitespace that could be stripped. I have some trivial comments:
There are a few places where you could use braces for single statement loops/conditions. Doing this avoids some common problems but is not universally liked (as it could be thought ugly/noisy).
Declare all local functions
static
. This is not so important in small programs but is good practice. In large programs it avoids namespace pollution and may allow better optimization. Also many people put each function before it is first called and putmain
last. This avoids the need for local prototypes.Define variables at the point of first use where possible. So instead of
FILE *infile, *outfile; infile = fopen(filename, "r"); outfile = fopen(outfilename, "a");
do
FILE *infile = fopen(filename, "r"); FILE *outfile = fopen(outfilename, "a");
Failure of
fopen
does not mean that the file was empty. It means there was an error of some sort (see the value set in globalerrno
). (Clearly opening of output files can also fail, as canfputc
, but I expect you know that - you made clear that error handling was lacking).fgetc
returns anint
. The value ofEOF
does not necessarily fit into achar
(eg if char is unsigned on your platform), so your copy loop is not guaranteed to terminate. Your compiler should have warned you about that if you have sufficient warnings enabled.has_extension
assumes an extension length of 4. This sort of assumption and embedded constant is best avoided.path_combine
has a confused idea of the 'separator'. The first loop finds the last character of the string (actually it finds the 0円). You then do astrcmp
with the separatorsep
(which incidentally could beconst
) which always fails (aslast_char
points to the0円
). The confusion is that the loop is looking for a character whilestrcmp
is comparing strings (intended apparently to be 1 char long). You should either base the test on the string insep
(that is, not assuming it is just 1 char) or just definesep
as achar
and compare*last_char
withsep
:char sep = '/'; ... if (*last_char != sep) {...}
Clearly you then can't do a
strcat(dest, sep)
but you should know how longpath1
is anyway (from the while loop - which would be better asstrlen
etc).Note that sequences of
strcpy
andstrcat
are not robust WRT string overflow. String handling in C sucks.filepath
, into which you create your paths, might be better to usePATH_MAX
from limits.h, but that does nothing to fix the problem of overflow.scandir
(which masks the stdlib function of the same name) could take threeconst
parameters. Also I see no good reason to mix thechar dirname[]
andchar *ext
parameter styles - use one or the other as they mean the same (i.e.*
or[]
)In
scandir
I would probably return early ifopendir
fails:DIR *d = opendir(dirname); if (!d) { return 0; }
But this is not universally approved of. It has the advantage of reducing indenting in the rest of the function. On the other hand, some coding standards apply a strict "one return" rule in which a function may only have one return. For well written code this rule is not sensible, but much code is not well written.