I decided to spend the last few days writing a file-finding utility in C (written for GNU/Linux) and would love to be told if this code makes my ass look fat. I want to learn the right way to write C code so that I can contribute to the open source community (not that all open source code is written in C)!
This is the first time I write something of worth in a language other than Java so I would like it if you would comment on the visual aspects of the code as well as the implementation (better ways to accomplish the same thing).
Usage:
$ programName startingDir fileToMatch
Example:
$ ./accioFile / myFileICantFind.txt
Note: the program only handles file names for the second argument, not paths.
Outputs either the absolute path of the first match or "File not found".
Example:
/path/to/myFileICantFind.txt
The code itself doesn't do any input checking because that wasn't the goal of the exercise so please ignore that part.
Essentially, the program uses a breadth-first search starting in the base directory to find the requested file. The base directory is added to a queue and then the directory walk begins. It gets the first element of the queue and checks all the files in the directory, enqueue-ing any directories it finds and checking for the requested file. It then repeats the process until it runs out of items in the queue or finds the file.
I also want to note that I've run the code through Memcheck and am happy to report that there are no memory leaks (something I'm quite proud of, actually).
List of methods:
void enqueue(...): Adds item to queuechar *dequeue(): Removes item from queuevoid cleanup_queue(): Deletes remaining items in queueint isValidDirectory(...): Checks whether parameter is.or..char *readDir(...): Walks directory treeint main(...): Main method
#include <stdio.h>
#include <dirent.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
typedef struct node_t {
char *path;
struct node_t *next;
} Node;
Node *head = NULL;
Node *tail = NULL;
void enqueue(char *path) {
Node *node = malloc(sizeof(Node));
node->path = malloc(PATH_MAX * sizeof(char));
strcpy(node->path, path);
node->next = NULL;
if (head == NULL && tail == NULL) {
head = tail = node;
} else {
tail->next = node;
tail = node;
}
}
char *dequeue() {
char *path;
if (head == NULL && tail == NULL) {
return NULL;
}
path = head->path;
if (head == tail) {
free(head);
head = tail = NULL;
} else {
Node *temp = head;
head = head->next;
free(temp);
}
return path;
}
void cleanup_queue() {
Node *temp;
while (head != NULL) {
temp = head;
head = head->next;
free(temp->path);
free(temp);
}
}
int isValidDirectory(char *dir) { /* We want to avoid ".." and "." */
if (dir[0] == '.' && dir[1] == '0円') {
return 0;
} else if (dir[0] == '.' && dir[1] == '.' && dir[2] == '0円') {
return 0;
} else {
return 1;
}
}
char *readDir(char *file) {
DIR *d; /* Directory stream */
struct dirent *dir; /* Directory object */
struct stat file_stat; /* Stats object */
char *base; /* Absolute value of current directory */
char *path = malloc(PATH_MAX * sizeof(char)); /* Allocate space for path */
/* While items are still left in the queue... */
while ((base = dequeue()) != NULL) {
/* If user doesn't have read access, skip it */
if (access(base, R_OK) != 0) {
free(base);
continue;
}
/* Open directory stream */
d = opendir(base);
if (d) { /* If stream isn't null... */
for (int i = 0; (dir = readdir(d)) != NULL; i++) { /* For each file in the current directory... */
if (strcmp(base, "/") != 0) { /* If base path isn't root... */
sprintf(path, "%s/%s", base, dir->d_name); /* Concatenate base and d_name with a separator */
} else {
sprintf(path, "%s%s", base, dir->d_name); /* Don't use separator */
}
if (strcmp(dir->d_name, file) == 0) { /* If d_name and file being searched for are equal... */
free(base); /* Don't forget to close the door on your way out! */
closedir(d);
return path; /* return */
}
lstat(path, &file_stat); /* Get stats for path */
if (S_ISDIR(file_stat.st_mode) && isValidDirectory(dir->d_name)) { /* If path points to directory and it isn't '.' or '..'... */
enqueue(path); /* Add the directory to the queue */
}
}
free(base);
}
closedir(d); /* Be nice and close the directory stream :) */
}
free(path);
return NULL;
}
int main(int argc, char **argv) {
char *search = *(argv + 2);
char baseDir[PATH_MAX];
int failedFlag = 0;
realpath(*(argv + 1), baseDir);
enqueue(baseDir);
char *file = readDir(search);
if (file == NULL) {
printf("File not found\n");
failedFlag = 1;
} else {
printf("%s\n", file);
}
free(file);
cleanup_queue();
return failedFlag;
}
-
2\$\begingroup\$ Why the down vote? I'd just like some constructive criticism. Also, thank you @Jamal for editing the question's format, this was my first question. \$\endgroup\$Gab– Gab2018年01月15日 03:19:34 +00:00Commented Jan 15, 2018 at 3:19
-
\$\begingroup\$ @vnp, I've rollbacked your edit that "fixed indentation", as it made the indentation inconsistent and probably not what was actually written and reviewable. \$\endgroup\$Toby Speight– Toby Speight2018年01月15日 18:52:15 +00:00Commented Jan 15, 2018 at 18:52
4 Answers 4
Some observations, in no particular order:
Node *node = malloc(sizeof(Node)); node->path = malloc(PATH_MAX * sizeof(char));That first one is (arguably) better written as
malloc(sizeof *node). This is a useful style to adopt, as it saves embarrassment when you changed the type ofnodeand don't spot the second use of the type.In the second one,
sizeof (char)is 1 by definition, so the multiplication is a no-op and can be omitted.Note that in both cases, it's important to check that the allocation succeeded (did not return a null pointer) before using the memory. For a utility like this, the best response to allocation failure is to exit early with a non-zero status.
node->path = malloc(PATH_MAX); strcpy(node->path, path);There's no need to allocate
PATH_MAXcharacters for storage, as we only writestrlen(path)plus a terminating NUL:node->path = malloc(strlen(path)+1); strcpy(node->path, path);(In comments, you expressed uncertainty about using
strlen()here - see footnote)In a boolean context
(dir = readdir(d)) != NULLcan be written simply as(dir = readdir(d))(I keep the redundant parens as an indicator that assignment is intended, rather than a typo; this is a convention that GCC understands, and possibly other compilers too).The local variable
iis modified in theforloop, but never used. We can simplify towhile ((dir=readdir(d)))instead.Naming: avoid names like
readDirthat are pronounced the same as standard functions (readdir).Reduce scope. For example, by moving
tempinto thewhileloop incleanup_queue, we can make it constant:void cleanup_queue() { while (head) { Node *const temp = head; head = head->next; free(temp->path); free(temp); } }Another simplification: instead of testing the characters individually in
isValidDirectory(), it's probably clearer to usestrcmp()consistently:int isValidDirectory(const char *dir) { /* true unless dir is ".." or "." */ return strcmp(".", dir) && strcmp("..", dir); }As part of this, observe that the pattern
if (bool_expr) return 1; else return 0;is exactly equivalent to just
return bool_expr;Instead of testing for
baseequal to/, we should probably test for any path ending in/(since a user-supplied argument may well end in the separator):/* compose name, adding separator if needed */ sprintf(path, "%s%s%s", base, (base[strlen(base)-1] == '/' ? "" : "/"), dir->d_name);When printing the failure message:
printf("File not found\n");Prefer to use the standard error stream:
fprintf(stderr, "File not found\n");This makes the program play nicely with others.
Check arguments
char *search = *(argv + 2);That's normally written
char *search = argv[2];But in either case, this gives you undefined behaviour if the user doesn't supply two or more arguments. You should check
argc, and exit early with a usage message ifargc < 3.Check return values of library functions - for example, we call
realpath()with a user-supplied filename. There are several ways this can fail (as seen in the man page), andbaseDirwill then have undefined contents.
Footnote: It's perfectly fine to use strlen() on any null-terminated string, regardless of location. Perhaps you're thinking of warnings about using sizeof on string variables - that's when you need to really understand what's a pointer and what's an array. Here's what an array type looks like:
const char a[] = "Array"; // sizeof a == 6
// { 'A', 'r', 'r', 'a', 'y', '0円' }
// strlen(a) == 5
But if we access the array using a pointer to its elements, sizeof measures the pointer type, though strlen() still counts string characters:
const char *p = a; // sizeof p == sizeof (const char*)
// (often 4 or 8, depending on your system)
// strlen(p) == 5
Note that the assignment might be camouflaged somewhat:
void foo(const char p[]) {
// p is a pointer here, *not* an array!
}
-
1\$\begingroup\$ You and 2 others missed one particular check I feel is absolutely necessary, @user3629249 did get it. In C++ new() throws an exception if it fails, however, malloc() in C does not, it returns NULL.. Proper error handling is to check if
malloc()returned NULL or not. \$\endgroup\$2018年01月17日 21:12:48 +00:00Commented Jan 17, 2018 at 21:12 -
\$\begingroup\$ Good point, @pacman - I'm normally quite good at picking up on that one; I'll edit accordingly. \$\endgroup\$Toby Speight– Toby Speight2018年01月17日 22:32:48 +00:00Commented Jan 17, 2018 at 22:32
Both branches of adding node in
enqueuemaketail = node. Make it explicit:if (head == NULL) { head = node; } else { tail->next = node; } tail = node;Here I intentionally do not test for
tail == NULL. It is an invariant of the list; if you wish to be sure, assert it in both branches:if (head == NULL) { assert(tail == NULL); head = node; } else { assert(tail != NULL); tail->next = node; } tail = node;Similarly, node removal shall be more explicit:
Node * tmp = head; head = head->next; if (head == 0) { assert(tmp == head); tail = NULL; } free(tmp);accessfollowed by an actual access is a classic TOCTOU race. Testing for whatopendirreturns is just enough.Meanwhile, the
lstatreturn value shall also be tested. It it failed, you cannot trust whatfile_stattells you.I honestly do not understand why do you special case
basebeing"/".sizeof(char)is guaranteed to be 1. It is generally considered a bad form to use it as a factor.
-
\$\begingroup\$ Thanks for the input! I just wanted to address "I honestly do not understand why do you special case base being '/'." The special case is because if someone provides the value "/" as the base dir, the following paths will look like "//proc" and "//lost+found". \$\endgroup\$Gab– Gab2018年01月15日 18:37:53 +00:00Commented Jan 15, 2018 at 18:37
first time I write something of worth in a language other than Java
Good start
OP already has some good reviews, only a few crumbs are left.
As the data pointed to by
fileare un-changed, usesconstto allow wider use ofreadDir()and some potential optimizations. Same for others// char *readDir(char *file) { char *readDir(const char *file) { // int isValidDirectory(char *dir) { int isValidDirectory(const char *dir) {Concerning multiplying by the size of the data, which in this case is 1, either drop the
sizeof(char)or multiply with the size of the de-referenced pointer. I recommend to lead with thesizeofpart as that better ensures math is done with at leastsize_tmath - even though the order makes no difference here - it does so in more complex calculations.// path = malloc(PATH_MAX * sizeof(char)); path = malloc(sizeof *path * PATH_MAX); // node = malloc(sizeof(Node)); node = malloc(sizeof *Node);When code has an early
return, consider simpler code. (Additional simplification possible too)// OP's if (dir[0] == '.' && dir[1] == '0円') { return 0; } else if (dir[0] == '.' && dir[1] == '.' && dir[2] == '0円') { return 0; } else { return 1; }// Suggested if (dir[0] == '.' && dir[1] == '0円') { return 0; } if (dir[0] == '.' && dir[1] == '.' && dir[2] == '0円') { return 0; } return 1;// or if (dir[0] == '.') { if (dir[1] == '0円' || (dir[1] == '.' && dir[2] == '0円')) { return 0; } } return 1;"List of methods:" --> In C these are best called "List of functions:"
-
\$\begingroup\$ I meant to mention something like point 1, but forgot - thanks for addressing that one! As to point 3, I've got something similar, but also folded
if (bool_expr) return 1; else return 0;intoreturn bool_expr;without further explanation - do you think I should expand on that point? \$\endgroup\$Toby Speight– Toby Speight2018年01月16日 08:24:05 +00:00Commented Jan 16, 2018 at 8:24 -
\$\begingroup\$ @toby Expand as you wish if it makes the answer better. If inspired by others, add a credit \$\endgroup\$chux– chux2018年01月16日 13:10:06 +00:00Commented Jan 16, 2018 at 13:10
regarding:
char *readDir(char *file) {
the function: readdir(), all lower case, is a well known function that was exposed with the header file: dirent.h
It is a poor programming practice to write functions with the same name (even with some capitalization change) as a system function. Suggest changing that function name.
regarding:
char *path = malloc(PATH_MAX * sizeof(char)); /* Allocate space for path */
- the expression:
sizeof(char)is defined in the standard as 1. Multiplying anything by 1 has absolutely no effect. Suggest removing that expression. - always check (!=NULL) the returned value to assure the operation was successful.
regarding:
if (strcmp(base, "/") != 0) { /* If base path isn't root... */
sprintf(path, "%s/%s", base, dir->d_name); /* Concatenate base and d_name with a separator */
} else {
sprintf(path, "%s%s", base, dir->d_name); /* Don't use separator */
}
BEFORE using the field d_name, always check the 'type' of the entry to be sure it is a directory and not a regular file or a link, etc.
for ease of readability and understanding:
- consistently indent the code. Indent after every opening brace '{'. Unindent before every closing brace '}'. Suggest each indent level 4 spaces as that is visible even with variable width fonts.
- follow the axiom: only one statement per line and (at most) one variable declaration per statement. Treat the closing brace '}'. as a separate statement.
- separate code blocks (
forifelsewhiledo...whileswitchcasedefault) via a single blank line.
the posted code is calling dequeue() before calling enqueue() so the code will (probably) fail on the very first loop.
There may be more than one file in the file system those name matches the 'search' name, so should complete the search, not exiting.immediately.