I've progressed with the directory statistics function and want to proceed by making it a class.
Link to the first version of this code: Link
My current state of things:
About my namings:
I first used the name fcount
as I wanted to have it stand for files and folders as otherwise, the name would be too long. That's why I made the exception to shorten it. I'm still continuing with your more experienced solution for this.
I wrote pathlib
into the function name because I have the same function above it with os.walk
as this was my first way to try. But os.walk
seem to have problems scanning my network drive as it always returned 0 bytes. Therefore I've chosen pathlib
. Hope that makes sense.
About my classes: I'm starting to feel comfortable programming python but as soon as I start to use classes my whole code starts to fall apart and seems to have to be more complex. I know that's just a beginner problem, but as I usually can't solve the issues appearing, I'm careful with that route. I've now rewritten it into a class but facing a few problems now.
I started to try to structure it with the tips from the first CodeReview by writing the file search for-loop into the __init__
function but python was then saying it can't return a value from __init__
so I created a new method named def get_directory_statistics(self, scan_path):
.
I'm not sure where to input my scan_path
, into the __init__
or the first method def get_directory_statistics(self, scan_path):
.
Your advice to summarize two lines into one, sadly didn't work for me either return size_and_file_count(size_gb, all_types_count, file_count, folder_count)
. I couldn't get it to work. It's always saying size_and_file_count
is not defined or other Errors.
Optimizing the code: I outlined above why I sadly can't use os.walk for this. So this won't work for me. And C seems at the moment, not like an option as the only programming language I am familiar with is python and I guess it would be a more complex task to program a wrapper and the code itself in C
. I think most of it will be I/O bound, yes.
Again I learned a lot from the first answer here on CodeReview!
Below you'll find my solution after going over all the last notes
class get_size_and_file_count:
"""Gets the total size of a given dir and counts how many folders and files are in the given
path directory and return a file count, folder count and all non hidden files as a sum"""
def __init__(self, total_size = 0, non_hidden_files_count = 0, file_count = 0, folder_count = 0):
self.total_size = total_size
self.non_hidden_files_count = non_hidden_files_count
self.file_count = file_count
self.folder_count = folder_count
def get_directory_statistics(self, scan_path):
self.root_directory = Path(scan_path)
for f in self.root_directory.glob('**/*'):
if f.is_file():
self.file_count += 1
self.total_size += f.stat().st_size
if not f.name.startswith("."):
self.non_hidden_files_count += 1
if f.is_dir():
self.folder_count += 1
directory_statistics = [self.total_size, self.non_hidden_files_count, self.file_count, self.folder_count]
return directory_statistics
def print_directory_statistics(self):
print('Directory path to search: {}'.format(self.root_directory))
print('Directory size in GB: {:.2f}GB'.format(self.total_size / 1.0e9))
print('Amount of non hidden files: {}'.format(self.non_hidden_files_count))
print('Amount of files searched: {}'.format(self.file_count))
print('Amount of folders searched: {}'.format(self.folder_count))
result = get_size_and_file_count()
directory_statistics = result.get_directory_statistics("...") # Store directory statistics in var
result.print_directory_statistics() # Print directory statistics
2 Answers 2
Style conventions
Following the PEP style guide, class names should be named with CamelCase and documentation strings should be formatted like the following.
class DirectoryStatistics:
"""Gets the total size of a given dir and counts how many folders and
files are in the given path directory. Also offers a printing utility to output
the diagnosis results.
"""
def __init__(self, ...):
# etc.
Classes
You can't return values from __init__
(aka. the class constructor), because it is called when you instantiate an object, thus the return value is the object itself. But you can call methods in your __init__
method, that's why you should move the content of your get_directory_statistics
method into the __init__
method:
class DirectoryStatistics:
"""Gets the total size of a given dir and counts how many folders and
files are in the given path directory. Also offers a printing utility to output
the diagnosis results.
"""
def __init__(self, file_path):
self.root_directory = path(file_path)
self.file_count = 0
self.total_size = 0
self.non_hidden_files_count = 0
self.folder_count = 0
for f in self.root_directory.glob('**/*'):
if f.is_file():
self.file_count += 1
self.total_size += f.stat().st_size
if not f.name.startswith("."):
self.non_hidden_files_count += 1
if f.is_dir():
self.folder_count += 1
That way, by calling:
statistics = DirectoryStatistics(file_path)
you run the directory diagnosis and save the results in your object.
Then you can pretty-print the results using your print_statistics()
method.
-
\$\begingroup\$ Thank you for the hint with the style guide. I've rewritten that part. Moving the content of
get_directory_statistics()
into__init__
isn't working for me as I lose the opportunity to return the values.statistics
in your case wouldn't return anything as it's an object. I would be able to print the values withprint_directory_statistics
, yes but returning the values for further valuation wouldn't be possible. \$\endgroup\$BenjaminK– BenjaminK2020年04月15日 21:53:44 +00:00Commented Apr 15, 2020 at 21:53 -
1\$\begingroup\$ The diagnosis results are stored in the
statistics
object, you can retrieve them withstatistics.file_count
for example, that was the reason we restructured the code as a class. Instead of accessing your data viaresults[1]
you can now refer to them by name, which is way clearer. \$\endgroup\$Erich– Erich2020年04月15日 21:56:00 +00:00Commented Apr 15, 2020 at 21:56 -
\$\begingroup\$ I see, that worked. What do you think is more pythonic? Writing a method or putting the code in
__init__
? \$\endgroup\$BenjaminK– BenjaminK2020年04月15日 22:00:59 +00:00Commented Apr 15, 2020 at 22:00 -
1\$\begingroup\$ Putting the code in
__init__
is definitely the way to go. Otherwise you could instantiate your object and run yourprint_directory_results
method without actually running theget_directory_statistics
method first. This would print only the default values, which is not desirable. An object should never be in an unexpected state, that's why you use the__init__
method to avoid this. \$\endgroup\$Erich– Erich2020年04月15日 22:06:01 +00:00Commented Apr 15, 2020 at 22:06
Unnecessary Flexibility
def __init__(self, total_size = 0, non_hidden_files_count = 0, file_count = 0, folder_count = 0):
Why do you have the possibility of the caller starting the counters at values other than zero? Is it really needed? Or was it just a fancy way of declaring counter variables all in one line?
Note: PEP-8 would require that these keyword parameters not have spaces around the equal signs.
Stop writing classes
See Stop writing classes on YouTube
Your class should be a function if it is created, one method is called, and results are retrieved. So let's get rid of your class:
First of all, you want to return results in a nice package, like a named tuple. We can even add a nice method to this named tuple to print out the results as desired, but it is just decoration. The results are just plain-old-data:
from pathlib import Path
from typing import NamedTuple
class DirectoryStats(NamedTuple):
root_directory: Path
total_size: int
files: int
hidden_files: int
folders: int
def print(self):
print(f'Directory path to search: {self.root_directory}')
print(f'Directory size in GB: {self.total_size / 1.0e9:.2f}GB')
print(f'Amount of non hidden files: {self.files-self.hidden_files}')
print(f'Amount of files searched: {self.files}')
print(f'Amount of folders searched: {self.folders}')
Here, I'm using the typing module to automatically create the named tuple from type hints in the declaration.
I'm also using f'...'
strings to created formatted output without that ugly "... {} ... ".format(arg)
construct which separates the place where the result is created from the thing that the result is generated from.
Now, the scanning is just a simple function:
def get_size_and_file_count(scan_path) -> DirectoryStats:
"""
Your docstring here.
"""
files = folders = hidden = total_size = 0
root = Path(scan_path)
for f in root.glob('**/*'):
if f.is_file():
files += 1
total_size += f.stat().st_size
if f.name.startswith("."):
hidden += 1
elif f.is_dir():
folders += 1
return DirectoryStats(root, total_size, files, hidden, folders)
Pretty straight forward. The function initializes some counters, walks the scan_path
, counts stuff, and then in the return statement, constructs the named tuple we defined above.
I've removed a double negative. Instead of the file name does not start with a period incrementing a not hidden count, I count the hidden files.
Example usage:
if __name__ == '__main__':
result = directory_statistics('.')
result.print()
Produces on my machine, in my directory:
Directory path to search: .
Directory size in GB: 0.00GB
Amount of non hidden files: 22
Amount of files searched: 23
Amount of folders searched: 4
print_result = ...
, just writeresult.print_directory_statistics()
. Indeed, please make sure the code works, then we can review it properly. \$\endgroup\$