program that fully documents every file on drive

Question 1

I have a program which goes through your drive and does two things:

For every file extension it creates a file "file of extension_name" and whenever it encounters a file with that extension, it writes the path of it to its appropriate file.
Once it finishes going through every file, it goes through every file created in step 1 and counts all of the files in it, and writes to a new file the number of files to that extension.

NOTE: I do not follow any variable naming conventions, I exclusively use camelCase. If that is unsatisfactory, let me know...

NOTE: The runtime of the program depends upon how many files you have in your drive...

NOTE: Sorry I didn't really document the second half of my code that well...

NOTE: Most of the try-except blocks came from a bunch of errors I kept getting while trying to access certain files...

import os
import time
def documentFile(path):
 global numberOfFilesChecked
 for r in range(len(path) - 1, -1, -1): # search from the back to the front of
 # the file name for the period which marks the extension.
 if path[r] == ".":
 try:
 fObject = open(f"file of {path[r + 1:]}.txt", "a+")
 # we create a file named "file of extension", and we set it to append
 # mode
 numberOfFilesChecked += 1
 fObject.write(path + "\n")
 # we then write the name of the file being indexed to the file we just opened, then add a line break.
 fObject.close()
 if numberOfFilesChecked % 5000 == 0:
 print(numberOfFilesChecked)
 except FileNotFoundError:
 pass
 break
def loopThroughDir(path):
 try:
 for iterator in os.listdir(path):
 if os.path.isdir(path + "\\" + iterator): # if the path is a folder
 loopThroughDir(path + "\\" + iterator)
 else: # if it's a regular file
 if iterator[0:4] != "fileO":
 documentFile(path + "\\" + iterator)
 except PermissionError:
 pass
if __name__ == '__main__':
 documentationStrings = []
 extension = None
 cTime = (int(time.time()) / 3600)
 os.makedirs(f"C:\\fileDocumentation\\{cTime}")
 os.chdir(f"C:\\fileDocumentation\\{cTime}")
 numberOfFilesChecked = 0
 loopThroughDir("C:\\")
 print(int(time.time())/3600 - int(cTime))
 print(numberOfFilesChecked)
 numberOfFileTypes = 0
 listOfFilesInCTime = os.listdir(f"C:\\fileDocumentation\\{cTime}")
 aFile = open("controlFile.txt", "a+")
 for fileContainingExtensions in listOfFilesInCTime:
 numberOfFileTypes += 1
 extension = None
 fileExtensionFile = open(f"C:\\fileDocumentation\\{cTime}\\" + fileContainingExtensions)
 numberOfFilesCheckedInEachExtension = 0
 for eachLine in fileExtensionFile:
 for x in range(len(eachLine) - 1, -1, -1): # search from the back to the front of
 # the file name for the period which marks the extension.
 if eachLine[x] == ".":
 extension = eachLine[x:]
 # because there is a \n at the end of the line we remove it.
 extension.strip("\n")
 break
 numberOfFilesCheckedInEachExtension += 1
 extension.strip("\n")
 documentationStrings.append([numberOfFilesCheckedInEachExtension, f"{numberOfFilesCheckedInEachExtension}"
 f" is the number of {extension}'s in the "
 f"file "
 f"which is {(numberOfFilesCheckedInEachExtension / numberOfFilesChecked) * 100}% "
 f"percent of the total {numberOfFilesChecked}.\n"])
 documentationStrings.sort()
 documentationStrings.reverse()
 for i in documentationStrings:
 aFile.write(i[1])

Question 2

PEP-8

Python introduced a coding convention almost 20 years ago, namely PEP-8. Quoting certain section from the same:

[...] The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. [...]

[...]

Some other good reasons to ignore a particular guideline:

When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.

To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).

Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.

When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.

I do not think any of these 4 apply in your case, so it is better to follow convention for one-off scripts (and put it into habit). The most common ones being:

variables and functions follow lower_snake_case naming
constants follow UPPER_SNAKE_CASE
classes are UpperCamelCase

Functions

You start off with declaring 2 functions, then everything else is a mess of code all put inside the if __name__ clause. Split this further into functions.

os.path vs pathlib

Since you're using python 3.x, may I suggest looking into a more friendlier inbuilt module for path traversals, called pathlib. The docs provide few examples on how to easily convert your current code utilizing os.path and friends with their equivalent pathlib calls.

File contexts

Python also has a context based file handling. You currently are opening the files explicitly, without ever calling a .close(). Prefer using the following syntax when operating with files:

with open("path/to/file", "mode") as fp:
 fp.write(something)
 # something_else = fp.read()

Comments vs docstrings

Use docstrings instead of comments when describing what a function does. The spec can be found on PEP-257.

PS: A lot of your code appears to be doing unnecessary work due to usage of string operations, instead of proper inbuilt methods. Few examples:

iterating over path string character by character to get extension instead of using os.path.splitext (pathlib has similar functions)
generating path yourself, instead of having a glob based iteration over directory tree
stripping \n from extension, but not really overwriting the variable itself (extension.strip("\n"))

hjpotter92 hjpotter92 8,9211 gold badge26 silver badges50 bronze badges · Accepted Answer · 2020-10-18 04:19:29Z

PEP-8

Python introduced a coding convention almost 20 years ago, namely PEP-8. Quoting certain section from the same:

[...] The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. [...]

[...]

Some other good reasons to ignore a particular guideline:

When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.

To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).

Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.

When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.

I do not think any of these 4 apply in your case, so it is better to follow convention for one-off scripts (and put it into habit). The most common ones being:

variables and functions follow lower_snake_case naming
constants follow UPPER_SNAKE_CASE
classes are UpperCamelCase

Functions

You start off with declaring 2 functions, then everything else is a mess of code all put inside the if __name__ clause. Split this further into functions.

os.path vs pathlib

Since you're using python 3.x, may I suggest looking into a more friendlier inbuilt module for path traversals, called pathlib. The docs provide few examples on how to easily convert your current code utilizing os.path and friends with their equivalent pathlib calls.

File contexts

Python also has a context based file handling. You currently are opening the files explicitly, without ever calling a .close(). Prefer using the following syntax when operating with files:

with open("path/to/file", "mode") as fp:
 fp.write(something)
 # something_else = fp.read()

Comments vs docstrings

Use docstrings instead of comments when describing what a function does. The spec can be found on PEP-257.

PS: A lot of your code appears to be doing unnecessary work due to usage of string operations, instead of proper inbuilt methods. Few examples:

iterating over path string character by character to get extension instead of using os.path.splitext (pathlib has similar functions)
generating path yourself, instead of having a glob based iteration over directory tree
stripping \n from extension, but not really overwriting the variable itself (extension.strip("\n"))

Stack Exchange Network

program that fully documents every file on drive

1 Answer 1

PEP-8

Functions

os.path vs pathlib

File contexts

Comments vs docstrings

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

program that fully documents every file on drive

1 Answer 1

PEP-8

Functions

os.path vs pathlib

File contexts

Comments vs docstrings

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions