I have a program which goes through your drive and does two things:
- For every file extension it creates a file "file of extension_name" and whenever it encounters a file with that extension, it writes the path of it to its appropriate file.
- Once it finishes going through every file, it goes through every file created in step 1 and counts all of the files in it, and writes to a new file the number of files to that extension.
NOTE: I do not follow any variable naming conventions, I exclusively use camelCase. If that is unsatisfactory, let me know...
NOTE: The runtime of the program depends upon how many files you have in your drive...
NOTE: Sorry I didn't really document the second half of my code that well...
NOTE: Most of the try-except blocks came from a bunch of errors I kept getting while trying to access certain files...
import os
import time
def documentFile(path):
global numberOfFilesChecked
for r in range(len(path) - 1, -1, -1): # search from the back to the front of
# the file name for the period which marks the extension.
if path[r] == ".":
try:
fObject = open(f"file of {path[r + 1:]}.txt", "a+")
# we create a file named "file of extension", and we set it to append
# mode
numberOfFilesChecked += 1
fObject.write(path + "\n")
# we then write the name of the file being indexed to the file we just opened, then add a line break.
fObject.close()
if numberOfFilesChecked % 5000 == 0:
print(numberOfFilesChecked)
except FileNotFoundError:
pass
break
def loopThroughDir(path):
try:
for iterator in os.listdir(path):
if os.path.isdir(path + "\\" + iterator): # if the path is a folder
loopThroughDir(path + "\\" + iterator)
else: # if it's a regular file
if iterator[0:4] != "fileO":
documentFile(path + "\\" + iterator)
except PermissionError:
pass
if __name__ == '__main__':
documentationStrings = []
extension = None
cTime = (int(time.time()) / 3600)
os.makedirs(f"C:\\fileDocumentation\\{cTime}")
os.chdir(f"C:\\fileDocumentation\\{cTime}")
numberOfFilesChecked = 0
loopThroughDir("C:\\")
print(int(time.time())/3600 - int(cTime))
print(numberOfFilesChecked)
numberOfFileTypes = 0
listOfFilesInCTime = os.listdir(f"C:\\fileDocumentation\\{cTime}")
aFile = open("controlFile.txt", "a+")
for fileContainingExtensions in listOfFilesInCTime:
numberOfFileTypes += 1
extension = None
fileExtensionFile = open(f"C:\\fileDocumentation\\{cTime}\\" + fileContainingExtensions)
numberOfFilesCheckedInEachExtension = 0
for eachLine in fileExtensionFile:
for x in range(len(eachLine) - 1, -1, -1): # search from the back to the front of
# the file name for the period which marks the extension.
if eachLine[x] == ".":
extension = eachLine[x:]
# because there is a \n at the end of the line we remove it.
extension.strip("\n")
break
numberOfFilesCheckedInEachExtension += 1
extension.strip("\n")
documentationStrings.append([numberOfFilesCheckedInEachExtension, f"{numberOfFilesCheckedInEachExtension}"
f" is the number of {extension}'s in the "
f"file "
f"which is {(numberOfFilesCheckedInEachExtension / numberOfFilesChecked) * 100}% "
f"percent of the total {numberOfFilesChecked}.\n"])
documentationStrings.sort()
documentationStrings.reverse()
for i in documentationStrings:
aFile.write(i[1])
1 Answer 1
PEP-8
Python introduced a coding convention almost 20 years ago, namely PEP-8. Quoting certain section from the same:
[...] The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. [...]
[...]
Some other good reasons to ignore a particular guideline:
- When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
- To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).
- Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
- When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.
I do not think any of these 4 apply in your case, so it is better to follow convention for one-off scripts (and put it into habit). The most common ones being:
- variables and functions follow
lower_snake_case
naming - constants follow
UPPER_SNAKE_CASE
- classes are
UpperCamelCase
Functions
You start off with declaring 2 functions, then everything else is a mess of code all put inside the if __name__
clause. Split this further into functions.
os.path vs pathlib
Since you're using python 3.x, may I suggest looking into a more friendlier inbuilt module for path traversals, called pathlib
. The docs provide few examples on how to easily convert your current code utilizing os.path
and friends with their equivalent pathlib
calls.
File contexts
Python also has a context based file handling. You currently are opening the files explicitly, without ever calling a .close()
. Prefer using the following syntax when operating with files:
with open("path/to/file", "mode") as fp:
fp.write(something)
# something_else = fp.read()
Comments vs docstrings
Use docstrings instead of comments when describing what a function does. The spec can be found on PEP-257.
PS: A lot of your code appears to be doing unnecessary work due to usage of string operations, instead of proper inbuilt methods. Few examples:
- iterating over path string character by character to get extension instead of using
os.path.splitext
(pathlib has similar functions) - generating path yourself, instead of having a glob based iteration over directory tree
- stripping
\n
from extension, but not really overwriting the variable itself (extension.strip("\n")
)