Run directory scan as Class

Question 1

I've progressed with the directory statistics function and want to proceed by making it a class.

Link to the first version of this code: Link

My current state of things:

About my namings: I first used the name fcount as I wanted to have it stand for files and folders as otherwise, the name would be too long. That's why I made the exception to shorten it. I'm still continuing with your more experienced solution for this. I wrote pathlib into the function name because I have the same function above it with os.walk as this was my first way to try. But os.walk seem to have problems scanning my network drive as it always returned 0 bytes. Therefore I've chosen pathlib. Hope that makes sense.

About my classes: I'm starting to feel comfortable programming python but as soon as I start to use classes my whole code starts to fall apart and seems to have to be more complex. I know that's just a beginner problem, but as I usually can't solve the issues appearing, I'm careful with that route. I've now rewritten it into a class but facing a few problems now. I started to try to structure it with the tips from the first CodeReview by writing the file search for-loop into the __init__ function but python was then saying it can't return a value from __init__ so I created a new method named def get_directory_statistics(self, scan_path):. I'm not sure where to input my scan_path, into the __init__or the first method def get_directory_statistics(self, scan_path):. Your advice to summarize two lines into one, sadly didn't work for me either return size_and_file_count(size_gb, all_types_count, file_count, folder_count). I couldn't get it to work. It's always saying size_and_file_count is not defined or other Errors.

Optimizing the code: I outlined above why I sadly can't use os.walk for this. So this won't work for me. And C seems at the moment, not like an option as the only programming language I am familiar with is python and I guess it would be a more complex task to program a wrapper and the code itself in C. I think most of it will be I/O bound, yes.

Again I learned a lot from the first answer here on CodeReview!

Below you'll find my solution after going over all the last notes

class get_size_and_file_count:
 """Gets the total size of a given dir and counts how many folders and files are in the given
 path directory and return a file count, folder count and all non hidden files as a sum"""
 def __init__(self, total_size = 0, non_hidden_files_count = 0, file_count = 0, folder_count = 0):
 self.total_size = total_size
 self.non_hidden_files_count = non_hidden_files_count
 self.file_count = file_count
 self.folder_count = folder_count
 def get_directory_statistics(self, scan_path):
 self.root_directory = Path(scan_path)
 for f in self.root_directory.glob('**/*'):
 if f.is_file():
 self.file_count += 1
 self.total_size += f.stat().st_size
 if not f.name.startswith("."):
 self.non_hidden_files_count += 1
 if f.is_dir():
 self.folder_count += 1
 directory_statistics = [self.total_size, self.non_hidden_files_count, self.file_count, self.folder_count]
 return directory_statistics
 def print_directory_statistics(self):
 print('Directory path to search: {}'.format(self.root_directory))
 print('Directory size in GB: {:.2f}GB'.format(self.total_size / 1.0e9))
 print('Amount of non hidden files: {}'.format(self.non_hidden_files_count))
 print('Amount of files searched: {}'.format(self.file_count))
 print('Amount of folders searched: {}'.format(self.folder_count))
result = get_size_and_file_count()
directory_statistics = result.get_directory_statistics("...") # Store directory statistics in var
result.print_directory_statistics() # Print directory statistics

Question 2

It was working and I added the changes that a member suggested, he also suggested to make a new CodeReview out of that. Link to tread 01 at the top of the page.

Question 3

Instead of print_result = ..., just write result.print_directory_statistics(). Indeed, please make sure the code works, then we can review it properly.

Question 4

Thanks for all your feedback, I've adjusted accordingly.

Question 5

Style conventions

Following the PEP style guide, class names should be named with CamelCase and documentation strings should be formatted like the following.

class DirectoryStatistics:
 """Gets the total size of a given dir and counts how many folders and
 files are in the given path directory. Also offers a printing utility to output
 the diagnosis results.
 """
 def __init__(self, ...):
 # etc.

Classes

You can't return values from __init__ (aka. the class constructor), because it is called when you instantiate an object, thus the return value is the object itself. But you can call methods in your __init__ method, that's why you should move the content of your get_directory_statistics method into the __init__ method:

class DirectoryStatistics:
 """Gets the total size of a given dir and counts how many folders and
 files are in the given path directory. Also offers a printing utility to output
 the diagnosis results.
 """
 def __init__(self, file_path):
 self.root_directory = path(file_path)
 self.file_count = 0
 self.total_size = 0
 self.non_hidden_files_count = 0
 self.folder_count = 0
 for f in self.root_directory.glob('**/*'):
 if f.is_file():
 self.file_count += 1
 self.total_size += f.stat().st_size
 if not f.name.startswith("."):
 self.non_hidden_files_count += 1
 if f.is_dir():
 self.folder_count += 1

That way, by calling:

statistics = DirectoryStatistics(file_path)

you run the directory diagnosis and save the results in your object. Then you can pretty-print the results using your print_statistics() method.

Question 6

Thank you for the hint with the style guide. I've rewritten that part. Moving the content of get_directory_statistics() into __init__ isn't working for me as I lose the opportunity to return the values. statistics in your case wouldn't return anything as it's an object. I would be able to print the values with print_directory_statistics, yes but returning the values for further valuation wouldn't be possible.

Question 7

The diagnosis results are stored in the statistics object, you can retrieve them with statistics.file_count for example, that was the reason we restructured the code as a class. Instead of accessing your data via results[1] you can now refer to them by name, which is way clearer.

Question 8

I see, that worked. What do you think is more pythonic? Writing a method or putting the code in __init__?

Question 9

Putting the code in __init__ is definitely the way to go. Otherwise you could instantiate your object and run your print_directory_results method without actually running the get_directory_statistics method first. This would print only the default values, which is not desirable. An object should never be in an unexpected state, that's why you use the __init__ method to avoid this.

Question 10

Unnecessary Flexibility

def __init__(self, total_size = 0, non_hidden_files_count = 0, file_count = 0, folder_count = 0):

Why do you have the possibility of the caller starting the counters at values other than zero? Is it really needed? Or was it just a fancy way of declaring counter variables all in one line?

Note: PEP-8 would require that these keyword parameters not have spaces around the equal signs.

Stop writing classes

See Stop writing classes on YouTube

Your class should be a function if it is created, one method is called, and results are retrieved. So let's get rid of your class:

First of all, you want to return results in a nice package, like a named tuple. We can even add a nice method to this named tuple to print out the results as desired, but it is just decoration. The results are just plain-old-data:

from pathlib import Path
from typing import NamedTuple
class DirectoryStats(NamedTuple):
 root_directory: Path
 total_size: int
 files: int
 hidden_files: int
 folders: int
 def print(self):
 print(f'Directory path to search: {self.root_directory}')
 print(f'Directory size in GB: {self.total_size / 1.0e9:.2f}GB')
 print(f'Amount of non hidden files: {self.files-self.hidden_files}')
 print(f'Amount of files searched: {self.files}')
 print(f'Amount of folders searched: {self.folders}')

Here, I'm using the typing module to automatically create the named tuple from type hints in the declaration.

I'm also using f'...' strings to created formatted output without that ugly "... {} ... ".format(arg) construct which separates the place where the result is created from the thing that the result is generated from.

Now, the scanning is just a simple function:

def get_size_and_file_count(scan_path) -> DirectoryStats:
 """
 Your docstring here.
 """
 files = folders = hidden = total_size = 0
 root = Path(scan_path)
 for f in root.glob('**/*'):
 if f.is_file():
 files += 1
 total_size += f.stat().st_size
 if f.name.startswith("."):
 hidden += 1
 elif f.is_dir():
 folders += 1
 return DirectoryStats(root, total_size, files, hidden, folders)

Pretty straight forward. The function initializes some counters, walks the scan_path, counts stuff, and then in the return statement, constructs the named tuple we defined above.

I've removed a double negative. Instead of the file name does not start with a period incrementing a not hidden count, I count the hidden files.

Example usage:

if __name__ == '__main__':
 result = directory_statistics('.')
 result.print()

Produces on my machine, in my directory:

Directory path to search: .
Directory size in GB: 0.00GB
Amount of non hidden files: 22
Amount of files searched: 23
Amount of folders searched: 4

Erich Erich 4743 silver badges12 bronze badges · Answer 1 · 2020-04-15 21:04:36Z

Style conventions

Following the PEP style guide, class names should be named with CamelCase and documentation strings should be formatted like the following.

class DirectoryStatistics:
 """Gets the total size of a given dir and counts how many folders and
 files are in the given path directory. Also offers a printing utility to output
 the diagnosis results.
 """
 def __init__(self, ...):
 # etc.

Classes

You can't return values from __init__ (aka. the class constructor), because it is called when you instantiate an object, thus the return value is the object itself. But you can call methods in your __init__ method, that's why you should move the content of your get_directory_statistics method into the __init__ method:

class DirectoryStatistics:
 """Gets the total size of a given dir and counts how many folders and
 files are in the given path directory. Also offers a printing utility to output
 the diagnosis results.
 """
 def __init__(self, file_path):
 self.root_directory = path(file_path)
 self.file_count = 0
 self.total_size = 0
 self.non_hidden_files_count = 0
 self.folder_count = 0
 for f in self.root_directory.glob('**/*'):
 if f.is_file():
 self.file_count += 1
 self.total_size += f.stat().st_size
 if not f.name.startswith("."):
 self.non_hidden_files_count += 1
 if f.is_dir():
 self.folder_count += 1

That way, by calling:

statistics = DirectoryStatistics(file_path)

you run the directory diagnosis and save the results in your object. Then you can pretty-print the results using your print_statistics() method.

Thank you for the hint with the style guide. I've rewritten that part. Moving the content of get_directory_statistics() into __init__ isn't working for me as I lose the opportunity to return the values. statistics in your case wouldn't return anything as it's an object. I would be able to print the values with print_directory_statistics, yes but returning the values for further valuation wouldn't be possible.
The diagnosis results are stored in the statistics object, you can retrieve them with statistics.file_count for example, that was the reason we restructured the code as a class. Instead of accessing your data via results[1] you can now refer to them by name, which is way clearer.
I see, that worked. What do you think is more pythonic? Writing a method or putting the code in __init__?
Putting the code in __init__ is definitely the way to go. Otherwise you could instantiate your object and run your print_directory_results method without actually running the get_directory_statistics method first. This would print only the default values, which is not desirable. An object should never be in an unexpected state, that's why you use the __init__ method to avoid this.

AJNeufeld AJNeufeld 35.3k5 gold badges41 silver badges103 bronze badges · Answer 2 · 2020-04-15 22:42:56Z

Unnecessary Flexibility

def __init__(self, total_size = 0, non_hidden_files_count = 0, file_count = 0, folder_count = 0):

Why do you have the possibility of the caller starting the counters at values other than zero? Is it really needed? Or was it just a fancy way of declaring counter variables all in one line?

Note: PEP-8 would require that these keyword parameters not have spaces around the equal signs.

Stop writing classes

See Stop writing classes on YouTube

Your class should be a function if it is created, one method is called, and results are retrieved. So let's get rid of your class:

First of all, you want to return results in a nice package, like a named tuple. We can even add a nice method to this named tuple to print out the results as desired, but it is just decoration. The results are just plain-old-data:

from pathlib import Path
from typing import NamedTuple
class DirectoryStats(NamedTuple):
 root_directory: Path
 total_size: int
 files: int
 hidden_files: int
 folders: int
 def print(self):
 print(f'Directory path to search: {self.root_directory}')
 print(f'Directory size in GB: {self.total_size / 1.0e9:.2f}GB')
 print(f'Amount of non hidden files: {self.files-self.hidden_files}')
 print(f'Amount of files searched: {self.files}')
 print(f'Amount of folders searched: {self.folders}')

Here, I'm using the typing module to automatically create the named tuple from type hints in the declaration.

I'm also using f'...' strings to created formatted output without that ugly "... {} ... ".format(arg) construct which separates the place where the result is created from the thing that the result is generated from.

Now, the scanning is just a simple function:

def get_size_and_file_count(scan_path) -> DirectoryStats:
 """
 Your docstring here.
 """
 files = folders = hidden = total_size = 0
 root = Path(scan_path)
 for f in root.glob('**/*'):
 if f.is_file():
 files += 1
 total_size += f.stat().st_size
 if f.name.startswith("."):
 hidden += 1
 elif f.is_dir():
 folders += 1
 return DirectoryStats(root, total_size, files, hidden, folders)

Pretty straight forward. The function initializes some counters, walks the scan_path, counts stuff, and then in the return statement, constructs the named tuple we defined above.

I've removed a double negative. Instead of the file name does not start with a period incrementing a not hidden count, I count the hidden files.

Example usage:

if __name__ == '__main__':
 result = directory_statistics('.')
 result.print()

Produces on my machine, in my directory:

Directory path to search: .
Directory size in GB: 0.00GB
Amount of non hidden files: 22
Amount of files searched: 23
Amount of folders searched: 4

Stack Exchange Network

Run directory scan as Class

2 Answers 2

Unnecessary Flexibility

Stop writing classes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Run directory scan as Class

2 Answers 2

Unnecessary Flexibility

Stop writing classes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions