I have just implemented this while getting acquainted with the std::filesystem
library.
On input from console, the program accepts two arguments: argv[1]
is a string of option flags preceded by a hyphen -
, and argv[2]
is the directory path which is about to be inspected.
For clarity, in the title for this question, "simplified simulation" means I have implemented only a few flags, mainly sorting the file contents according to file sizes, or number of files inside if directories, and display permission levels.
Here below, the output for the -h
flag:
-R List subdirectories recursively
-s Sort by file size
-d Sort by number of directories
-f Sort by number of files
-r Reverse sorting order
-p Display permissions
-h Display this information
If h
is present, program ignores other flags, prints the above and exits.
r
affects results only when one of the three sorting options sdf
is present.
There should only be one sorting flag s
, d
, or f
on input.
If more are present, the sorting will follow s > f > d
.
Each of the flag options has its own function.
Starting out, I intended to have main()
calling each of the flag option functions one by one in a linear sequence, but later I realised this was not practical because these flags interact with each other.
Unfortunately, perhaps this obfuscates the code a little.
The key function in the program is traverse()
, which traverses the directory path and stores information about each file inside the directory as a single Attributes
object.
traverse()
also collects all these Attributes
file objects in a std::vector
, which is then sent to the various option()
non-type template functions (the flag char
value is their template parameter) for rearrangement, and then printing to console.
Stemming from this question I asked on Stack Overflow, a std::variant
on custom types of file iterators provided by std::filesystem
is employed for dealing with flag -R
.
Another question I asked there, related to this project is this.
Find below the code.
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <filesystem>
#include <variant>
#include <algorithm>
#include <functional>
#include <type_traits>
#include <concepts>
#include <typeinfo>
using namespace std;
using namespace std::filesystem;
/***********************
*** PART OF THE CODE *
*** DEALING WITH ***
*** DATA STRUCTURE **
*** FOR FILES *****
***********************/
struct Attributes {
size_t size_bytes;
size_t n_directories;
size_t n_files;
string path;
Attributes operator+=(const Attributes& other) {
size_bytes += other.size_bytes;
n_directories += other.n_directories;
n_files += other.n_files;
return *this;
}
};
enum class Flag {
size,
dirs,
files
};
bool lessThan(const Attributes& one, const Attributes& two, const Flag& flag) {
bool ret;
switch (flag) {
case Flag::size:
ret = one.size_bytes < two.size_bytes;
if (one.size_bytes == two.size_bytes)
ret = one.n_directories < two.n_directories;
break;
case Flag::dirs:
ret = one.n_directories < two.n_directories;
if (one.n_directories == two.n_directories)
ret = one.size_bytes < two.size_bytes;
break;
case Flag::files:
ret = one.n_files < two.n_files;
if (one.n_files == two.n_files) {
ret = one.n_directories < two.n_directories;
if (one.n_directories == two.n_directories)
ret = one.size_bytes < two.size_bytes;
}
break;
default:
throw("Allowed flags: size, dirs, files\n");
break;
}
return ret;
}
/***********************
*** PART OF THE CODE *
*** DEALING WITH ***
*** TRAVERSING THE **
*** GIVEN DIR PATH *
***********************/
Attributes explore(const directory_entry& directory) {
Attributes attributes{};
attributes.path = directory.path().string();
for(const auto& entry : recursive_directory_iterator{ directory.path() })
if (entry.is_directory())
attributes.n_directories++;
else {
attributes.n_files++;
attributes.size_bytes += entry.file_size();
}
return attributes;
}
template <typename T>
concept isDirIter = is_same_v<T, directory_iterator> || is_same_v<T, recursive_directory_iterator>;
template <isDirIter dirIterator>
void traverse(dirIterator& my_iter, vector<Attributes>& my_vec) {
Attributes root_attributes{};
root_attributes.path = my_iter->path().parent_path().string();
root_attributes.n_directories = 1;
for (const auto& entry : my_iter) {
Attributes entry_attributes{};
try {
if (entry.is_directory()) {
const auto attributes = explore(entry);
entry_attributes = attributes;
entry_attributes.n_directories++;
if (typeid(my_iter) == typeid(recursive_directory_iterator))
root_attributes.n_directories++;
else if (typeid(my_iter) == typeid(directory_iterator))
root_attributes += entry_attributes;
}
else {
entry_attributes.path = entry.path().string();
entry_attributes.n_files++;
error_code ec;
entry_attributes.size_bytes += entry.file_size(ec);
root_attributes += entry_attributes;
if (ec)
cerr << "Error reading file size: " << entry.path().string() << endl;
}
my_vec.push_back(entry_attributes);
}
catch (const exception& e) {
cerr << "Exception: " << e.what();
}
}
my_vec.push_back(root_attributes);
}
/***********************
*** PART OF THE CODE *
*** DEALING WITH ***
*** MANAGING THE ***
*** INPUT OPTIONS ***
*** GIVEN BY USER **
***********************/
template <Flag my_flag>
void option(string_view options, vector<Attributes>& my_vec, const char pick) {
if (options.find(pick) != string::npos) {
make_heap(
my_vec.begin(),
my_vec.end(),
bind(lessThan, placeholders::_1, placeholders::_2, my_flag)
);
sort_heap(
my_vec.begin(),
my_vec.end(),
bind(lessThan, placeholders::_1, placeholders::_2, my_flag)
);
if (options.find('r') != string::npos)
reverse(my_vec.begin(), my_vec.end());
}
}
using DirIterVariant = variant<directory_iterator, recursive_directory_iterator>;
DirIterVariant option_R(string_view options, const path& my_path) {
if (options.find('R') != string::npos)
return recursive_directory_iterator(my_path, directory_options::skip_permission_denied);
else
return directory_iterator(my_path);
}
void option_h() {
cout << setw(5) << "-R" << " " << setw(35) << "List subdirectories recursively\n";
cout << setw(5) << "-s" << " " << setw(35) << "Sort by file size\n";
cout << setw(5) << "-d" << " " << setw(35) << "Sort by number of directories\n";
cout << setw(5) << "-f" << " " << setw(35) << "Sort by number of files\n";
cout << setw(5) << "-r" << " " << setw(35) << "Reverse sorting order\n";
cout << setw(5) << "-p" << " " << setw(35) << "Display permissions\n";
cout << setw(5) << "-h" << " " << setw(35) << "Display this information\n";
}
/***********************
*** PART OF THE CODE *
**** DEALING WITH **
**** PRINTING THE **
*** RESULTS TO THE *
***** CONSOLE ****
***********************/
void printOption_p(Attributes file) {
perms my_permits = status(file.path).permissions();
auto show =
[=](char op, perms perm) {
cout << ( (perm & my_permits) == perms::none ? '-' : op );
};
show('r', perms::owner_read);
show('w', perms::owner_write);
show('x', perms::owner_exec);
show('r', perms::group_read);
show('w', perms::group_write);
show('x', perms::group_exec);
show('r', perms::others_read);
show('w', perms::others_write);
show('x', perms::others_exec);
cout << endl;
}
void printFiles(string_view options, const vector<Attributes> my_vec) {
size_t longest_pathname{};
for (const auto& file : my_vec)
longest_pathname = max(longest_pathname, file.path.size());
if (options.find('p') != string::npos) {
cout << "Size(bytes) Files Dirs " << string(6, ' ') << "File path "
<< string(longest_pathname-10, ' ') << "Permissions\n";
cout << string(12, '-') << " " << string(6, '-') << " " << string(6, '-') << " "
<< string(longest_pathname+3, '-') << " " << string(13, '-') << '\n';
for (const auto& file : my_vec) {
cout << right << setw(12) << file.size_bytes << " " << setw(6) << file.n_files << " " << setw(6)
<< file.n_directories << " " << left << setw(longest_pathname+3) << file.path << " ";
printOption_p(file);
}
} else {
cout << "Size(bytes) Files Dirs Name\n";
cout << "------------ ----- ----- ------------\n";
for (const auto& file : my_vec)
cout << right << setw(12) << file.size_bytes << setw(6) << file.n_files << setw(6) << file.n_directories
<< " " << left << setw(longest_pathname+3) << file.path << "\n";
}
}
/*********************
***** MAIN ******
********************/
int main(int argc, const char **argv) {
if(argc != 3) {
cerr << "Exactly two arguments needed: "
<< "(1) list of option flags preceded by hyphen '-', and "
<< "(2) file path.\n"
<< "Type -h for more help\n";
return -1;
}
const path sys_path{argv[2]};
const string options{argv[1]};
vector<Attributes> files;
DirIterVariant file_iterator_v;
if (options[0] == '-') {
if (options.find('h') != string::npos) {
option_h();
return 0;
}
if (!is_directory(sys_path)) {
cout << "Not a directory.\n";
return -1;
}
else {
file_iterator_v = option_R(options, sys_path);
visit(
[&](auto&& file_iterator) {
traverse(file_iterator, files);
},
file_iterator_v
);
option<Flag::dirs >(options, files, 'd');
option<Flag::files>(options, files, 'f');
option<Flag::size >(options, files, 's');
}
}
else {
cout << "No flag specified, simply print shallow file search" << endl;
directory_iterator my_iter(sys_path);
traverse(my_iter, files);
}
printFiles(options, files);
}
For convenience, a MWE is here.
I created some files and directories in the MWE main()
through std::filesystem
utilities, then fed to the program the input flags -Rfdsrp
(code line 265 in MWE link).
Output results:
Size(bytes) Files Dirs File path Permissions
------------ ------ ------ ------------------------------- -------------
907458 10 4 /app rwx------
897656 1 0 /app/output.s rwxrwxr-x
8737 1 0 /app/example.cpp rw-rw-r--
1065 1 0 /app/compilation-result.json rw-rw-r--
0 7 3 /app/foo rwxrwxr-x
0 5 2 /app/foo/bar rwxrwxr-x
0 3 1 /app/foo/bar/dir1 rwxrwxr-x
0 1 0 /app/foo/bar/dir1/file3.cpp rw-rw-r--
0 1 0 /app/foo/bar/dir1/file2.cpp rw-rw-r--
0 1 0 /app/foo/bar/another.cpp rw-rw-r--
0 1 0 /app/foo/bar/test.txt rw-rw-r--
0 1 0 /app/foo/file5.cpp rw-rw-r--
0 1 0 /app/foo/file1.cpp rw-rw-r--
0 1 0 /app/foo/bar/dir1/file4.cpp rw-rw-r--
3 Answers 3
Don't using namespace std
It pulls in too many names, that continuously increase in number.
Split out lessThan
into separate function objects
You are passing a template argument into lessThan
as a runtime parameter, rather than continuing to use it as a template.
Use std::tuple to compare multiple attributes
rather than having a chain of ==
<
etc for each additional member, you can use std::tuple::operator<
, which does that for you.
Sort in the direction you want
Rather than std::reverse
ing a sorted vector, you can just sort it with an opposite comparison.
template<Flag>
struct attributes_less;
template<>
struct attributes_less<Flag::size> {
bool operator()(const Attributes& one, const Attributes& two) const {
return std::tie(one.size_bytes, one.n_directories) < std::tie(two.size_bytes, two.n_directories);
}
}
template<>
struct attributes_less<Flag::dirs> {
bool operator()(const Attributes& one, const Attributes& two) const {
return std::tie(one.n_directories, one.size_bytes) < std::tie(two.n_directories, two.size_bytes);
}
}
template<>
struct attributes_less<Flag::files> {
bool operator()(const Attributes& one, const Attributes& two) const {
return std::tie(one.n_files, one.n_directories, one.size_bytes) < std::tie(two.n_files, two.n_directories, two.size_bytes);
}
}
template <Flag my_flag>
struct attributes_greater {
attributes_less<my_flag> less;
bool operator()(const Attributes& one, const Attributes& two) const {
return less(two, one);
}
}
template <Flag my_flag>
void option(std::string_view options, std::vector<Attributes>& my_vec, const char pick) {
if (options.find(pick) != std::string::npos) {
if (options.find('r') != std::string::npos)
std::sort(my_vec.begin(), my_vec.end(), attributes_greater<my_flag>{});
else
std::sort(my_vec.begin(), my_vec.end(), attributes_less<my_flag>{});
}
}
-
\$\begingroup\$ Functions
lessThan()
,reverse()
indeed become very concise and expressive with what you show here usingstd::tuple
and generic programming. May I presume familiarity with these structures and techniques comes from knowledge of practices in other computer languages? \$\endgroup\$Giogre– Giogre2025年02月21日 11:58:35 +00:00Commented Feb 21 at 11:58 -
\$\begingroup\$ @Giogre using
std::tuple
for ordering a class by it's members is a C++ idiom that is about as old asstd::tuple
. I assume it also turns up in other languages which have a standard tuple type \$\endgroup\$Caleth– Caleth2025年02月21日 12:02:29 +00:00Commented Feb 21 at 12:02 -
\$\begingroup\$ There is heavy usage of tuples in Python \$\endgroup\$Giogre– Giogre2025年02月21日 12:21:51 +00:00Commented Feb 21 at 12:21
-
1\$\begingroup\$ @Giogre To the extent there's an "original language" that this technique derives from, it's lexicographic orderings in mathematics. \$\endgroup\$wizzwizz4– wizzwizz42025年02月22日 13:37:51 +00:00Commented Feb 22 at 13:37
-
2\$\begingroup\$ @Giogre: Do be careful when using tuples for sorting not to accidentally make a copy of complex types; this is why Caleth uses
std::tie
, rather thanstd::make_tuple
. \$\endgroup\$Matthieu M.– Matthieu M.2025年02月24日 09:51:09 +00:00Commented Feb 24 at 9:51
General Observations
In some ways the code is very good, I compiled it on Ubuntu 22.04 using gcc-12 with the flags -Wall
, -Wextra
, and -pedantic
and there were no warnings. The variable and function names are generally understandable.
I would not want to have to maintain this code for a number of reasons, some of which have to do with what one might consider style issues such as indentation. It isn't clear that the indentation is the consistent:
void printFiles(string_view options, const vector<Attributes> my_vec) {
size_t longest_pathname{};
for (const auto& file : my_vec)
longest_pathname = max(longest_pathname, file.path.size());
if (options.find('p') != string::npos) {
cout << "Size(bytes) Files Dirs " << string(6, ' ') << "File path "
<< string(longest_pathname-10, ' ') << "Permissions\n";
cout << string(12, '-') << " " << string(6, '-') << " " << string(6, '-') << " "
<< string(longest_pathname+3, '-') << " " << string(13, '-') << '\n';
for (const auto& file : my_vec) {
cout << right << setw(12) << file.size_bytes << " " << setw(6) << file.n_files << " " << setw(6)
<< file.n_directories << " " << left << setw(longest_pathname+3) << file.path << " ";
printOption_p(file);
}
} else {
cout << "Size(bytes) Files Dirs Name\n";
cout << "------------ ----- ----- ------------\n";
for (const auto& file : my_vec)
cout << right << setw(12) << file.size_bytes << setw(6) << file.n_files << setw(6) << file.n_directories
<< " " << left << setw(longest_pathname+3) << file.path << "\n";
}
}
A second note on indentation is that 2 space indentation isn't as easy to read and modify as larger indentation, it also leads to including too much functionality into a single function, more on function complexity later.
The code indicates a tendency to put too much code into functions and modules. This observations was also made in the other review in the section Split out lessThan into separate function objects
. This program could easily be broken up into multiple header and source files, such as options.h
, possibly options.cpp
and the same for the directory search and printing.
Prefer Braces {
and }
Around Single Statements in if
or loops
Some programmers consider this a style issue, but it makes it much easier to read and maintain the code if each in an if
, else
or loop
block is embedded within braces. Extending the functionality of these statements can be problematic when the braces are not used. For a more in depth discussion of this see the first 2 answers on this Stack Overflow question. As one of the answers points out this is true in all C like languages (C, C++, C#, JavaScript, Java, etc.). I have worked at multiple companies where this was required in the coding standard and flagged during code reviews.
Magic Numbers
There are Magic Numbers in many places in the code, it might be better to create symbolic constants for them to make the code more readable and easier to maintain. These numbers may be used in many places and being able to change them by editing only one line makes maintenance easier.
Some examples:
return -1;
cout << setw(5) << "-R" << " " << setw(35) << "List subdirectories recursively\n";
cout << right << setw(12) << file.size_bytes << setw(6) << file.n_files << setw(6) << file.n_directories
<< " " << left << setw(longest_pathname+3) << file.path << "\n";
Return -1 is non-standard, the operating system probably expects 1 or 0, perfer EXIT_FAILURE and EXIT_SUCCESS which are portable.
Numeric constants in code are sometimes referred to as Magic Numbers, because there is no obvious meaning for them. There is a discussion of this on stackoverflow.
Avoid using namespace std;
People who maintain the code need to see where everything is coming from, this is especially true in larger programs that include multiple name spaces.
If you are coding professionally you probably should get out of the habit of using the using namespace std;
statement. The code will more clearly define where cout
and other identifiers are coming from (std::cin
, std::cout
). As you start using namespaces in your code it is better to identify where each function comes from because there may be function name collisions from different namespaces. The identifiercout
you may override within your own classes, and you may override the operator <<
in your own classes as well. This stack overflow question discusses this in more detail.
In the case of this code, the problem has been compounded with using namespace std::filesystem;
. A common way to shorten the file system is namespace fs = std::filesystem;
.
DRY Code
There is a programming principle called the Don't Repeat Yourself Principle sometimes referred to as DRY code. If you find yourself repeating the same code mutiple times it is better to encapsulate it in a function. If it is possible to loop through the code that can reduce repetition as well.
The following code could easily have been implemented using a table (vector) of structs and a loop:
void option_h() {
cout << setw(5) << "-R" << " " << setw(35) << "List subdirectories recursively\n";
cout << setw(5) << "-s" << " " << setw(35) << "Sort by file size\n";
cout << setw(5) << "-d" << " " << setw(35) << "Sort by number of directories\n";
cout << setw(5) << "-f" << " " << setw(35) << "Sort by number of files\n";
cout << setw(5) << "-r" << " " << setw(35) << "Reverse sorting order\n";
cout << setw(5) << "-p" << " " << setw(35) << "Display permissions\n";
cout << setw(5) << "-h" << " " << setw(35) << "Display this information\n";
}
The following code is easier to extend because all that needs to be updated is the table:
struct OptionListItem
{
char optionChar;
std::string optionDescription;
};
std::vector<OptionListItem> optionList =
{
{'R', "List subdirectories recursively"},
{'s', "Sort by file size"},
{'d', "Sort by number of directories"},
{'f', "Sort by number of files"},
{'r', "Reverse sorting order"},
{'p', "Display permissions"},
{'h', "Display this information"}
};
void help()
{
for (auto thisOption: optionList)
{
std::cout << std::setw(5) << "-" << thisOption.optionChar << " "
<< std::setw(35) << thisOption.optionDescription << "\n";
}
}
Complexity
The function main()
is too complex (does too much). As programs grow in size the use of main()
should be limited to calling functions that parse the command line, calling functions that set up for processing, calling functions that execute the desired function of the program, and calling functions to clean up after the main portion of the program.
There is also a programming principle called the Single Responsibility Principle that applies here. The Single Responsibility Principle states:
that every module, class, or function should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by that module, class or function.
-
\$\begingroup\$ Thank you, most of this is common sense. I'll admit I am aware that
using namespace std
is discouraged, as are the magic numbers, so I was caught red/handed. Refactoringmain()
seems by far the most difficult task \$\endgroup\$Giogre– Giogre2025年02月22日 00:03:01 +00:00Commented Feb 22 at 0:03 -
1\$\begingroup\$ I'm always eager to reduce redundancy; but with increasing experience and -- hopefully ;-) -- correspondingly increasing wisdom I note that (1) The non-redundant code is, perhaps surprisingly, longer; (2) is harder to understand; and (3) is harder to debug. The original code is as clear as a sunny day. It could not be any simpler and more straight-forward (although your magic number remark applies, but that would not make it more complicated). If anything does not print correctly, I can set a breakpoint and trivially inspect the program state at that point. \$\endgroup\$Peter - Reinstate Monica– Peter - Reinstate Monica2025年02月24日 09:46:10 +00:00Commented Feb 24 at 9:46
-
\$\begingroup\$ Bottom line is that this is a typical case: Factor out variable data; replace procedural code with data storage; thus minimize procedural code in favor of coded data stores that is looped over/iterated through. The result is more abstract and less redundant. It is sometimes easier to review because the data is not scattered across redundant code which makes your eyes water, blurring your reviewer vision, and thus is easier comparable, typos are easier spotted etc. But it is not as simple. It is really a trade-off. \$\endgroup\$Peter - Reinstate Monica– Peter - Reinstate Monica2025年02月24日 09:50:24 +00:00Commented Feb 24 at 9:50
-
\$\begingroup\$ instead of struct, you could use
std::map<std::char, std::string>
too right? Not sure which one is performative. \$\endgroup\$Abel Tom– Abel Tom2025年02月27日 17:18:03 +00:00Commented Feb 27 at 17:18 -
1\$\begingroup\$ @AbelTom Yes, that would be one possible alternative, it would be the better one if the information was needed in a lookup, but that isn't the case here. I don't see the need to include hashing for something that doesn't need a lookup. \$\endgroup\$2025年02月28日 13:17:24 +00:00Commented Feb 28 at 13:17
Adding to the other answers:
Use getopt_long()
or some option parsing library
Avoid rolling your own option parsing code. Inevitably it will not behave the same way as the standard getopt_long()
used by all core UNIX utilities, which will just surprise and annoy users.
Either use getopt_long()
if you only intend to target UNIX-like operating systems, or use a library that reimplements it as close as possible, like Boost.Program_options.
Use std::format()
for formatting output
Making formatted output with iostreams functions is not very nice. Since C++20 you can use std::format()
instead, or since C++23 you can even use std::print()
and friends. Consider writing:
std::print("{:12} {:6} {:6} {:{}}",
file.size_bytes, file.n_files, file.n_directories,
file.path, longest_pathname + 3);
Naming things
The names of your functions don't really explain what they are doing. In particular, it's hard to tell from the function name what it is returning. Also the names of some of the types and variables are unclear. Some better suggestions are:
explore()
: this is actually calculating statistics for a given directory. So I would name itcalculate_statistics()
. Maybe renameAttributes
toStatistics
as well, as normally one wouldn't think of the number of entries and total file size as an "attribute" of a directory.option()
: first of all this is a bad name because it is a noun, whereas functions typically do things and should thus preferrably have a verb as a name. If it processes an option, thenprocess_option()
would be a better name.my_vec
: the prefixmy_
is not giving any information, andvec
doesn't tell us anything we don't already know from the type of this parameter. Instead, name it after what this contains:attributes
, or even betterstatistics
.my_flag
: again,my_
is not useful, so justflag
.pick
: variables should typically be a noun, not a verb. Here it's an option, sooption
?option_R()
: again, this should be a verb. This function returns something, so it would be nice if the name gave us a hint about what kind of information is returned:make_directory_iterator()
perhaps?option_h()
: this prints options, soprint_options()
.printOption_p()
: what is_p
? Don't abbreviate unnecessarily. Could this bepermissions
? Also, this function does not have anything to do with options. Soprint_permissions()
would be a better name.printFiles()
: it doesn't print the contents of files, it prints statistics, and not just of files, but also of directories. Maybeprint_statistics()
would be a better name.file
: it's not a file, but rather an object of typeAttributes
.
You are also not using a consistent naming style. Sometimes you are using an underscore, like in option_h()
, sometimes camelCase, like in printFiles()
, and sometimes you do a mix of both, like in printOption_p()
. Which style you pick doesn't matter that much, but make sure you are consistent.
-
\$\begingroup\$ I prefer camelCase for function names, but in this case I was in a quandary because options read from the prompt are case sensitive, e.g. there is an option
-r
and an option-R
, so what to do? I decided to insert an underscore inbetween. But that is also what I do forvariable_names
so I'll admit this one is confusing. \$\endgroup\$Giogre– Giogre2025年02月22日 00:22:45 +00:00Commented Feb 22 at 0:22 -
\$\begingroup\$
std::print()
looks like a game changer. I had heard aboutboost::options
but didn't came to my mind to use it for this program \$\endgroup\$Giogre– Giogre2025年02月22日 00:31:33 +00:00Commented Feb 22 at 0:31
ls
is not part of bash, it's an unrelated program, standard with Unix and probably part of your local GNU system. \$\endgroup\$