3
\$\begingroup\$

I have a program which goes through your drive and does two things:

  1. For every file extension it creates a file "file of extension_name" and whenever it encounters a file with that extension, it writes the path of it to its appropriate file.
  2. Once it finishes going through every file, it goes through every file created in step 1 and counts all of the files in it, and writes to a new file the number of files to that extension.

NOTE: I do not follow any variable naming conventions, I exclusively use camelCase. If that is unsatisfactory, let me know...

NOTE: The runtime of the program depends upon how many files you have in your drive...

NOTE: Sorry I didn't really document the second half of my code that well...

NOTE: Most of the try-except blocks came from a bunch of errors I kept getting while trying to access certain files...

import os
import time
def documentFile(path):
 global numberOfFilesChecked
 for r in range(len(path) - 1, -1, -1): # search from the back to the front of
 # the file name for the period which marks the extension.
 if path[r] == ".":
 try:
 fObject = open(f"file of {path[r + 1:]}.txt", "a+")
 # we create a file named "file of extension", and we set it to append
 # mode
 numberOfFilesChecked += 1
 fObject.write(path + "\n")
 # we then write the name of the file being indexed to the file we just opened, then add a line break.
 fObject.close()
 if numberOfFilesChecked % 5000 == 0:
 print(numberOfFilesChecked)
 except FileNotFoundError:
 pass
 break
def loopThroughDir(path):
 try:
 for iterator in os.listdir(path):
 if os.path.isdir(path + "\\" + iterator): # if the path is a folder
 loopThroughDir(path + "\\" + iterator)
 else: # if it's a regular file
 if iterator[0:4] != "fileO":
 documentFile(path + "\\" + iterator)
 except PermissionError:
 pass
if __name__ == '__main__':
 documentationStrings = []
 extension = None
 cTime = (int(time.time()) / 3600)
 os.makedirs(f"C:\\fileDocumentation\\{cTime}")
 os.chdir(f"C:\\fileDocumentation\\{cTime}")
 numberOfFilesChecked = 0
 loopThroughDir("C:\\")
 print(int(time.time())/3600 - int(cTime))
 print(numberOfFilesChecked)
 numberOfFileTypes = 0
 listOfFilesInCTime = os.listdir(f"C:\\fileDocumentation\\{cTime}")
 aFile = open("controlFile.txt", "a+")
 for fileContainingExtensions in listOfFilesInCTime:
 numberOfFileTypes += 1
 extension = None
 fileExtensionFile = open(f"C:\\fileDocumentation\\{cTime}\\" + fileContainingExtensions)
 numberOfFilesCheckedInEachExtension = 0
 for eachLine in fileExtensionFile:
 for x in range(len(eachLine) - 1, -1, -1): # search from the back to the front of
 # the file name for the period which marks the extension.
 if eachLine[x] == ".":
 extension = eachLine[x:]
 # because there is a \n at the end of the line we remove it.
 extension.strip("\n")
 break
 numberOfFilesCheckedInEachExtension += 1
 extension.strip("\n")
 documentationStrings.append([numberOfFilesCheckedInEachExtension, f"{numberOfFilesCheckedInEachExtension}"
 f" is the number of {extension}'s in the "
 f"file "
 f"which is {(numberOfFilesCheckedInEachExtension / numberOfFilesChecked) * 100}% "
 f"percent of the total {numberOfFilesChecked}.\n"])
 documentationStrings.sort()
 documentationStrings.reverse()
 for i in documentationStrings:
 aFile.write(i[1])
asked Oct 18, 2020 at 2:05
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

PEP-8

Python introduced a coding convention almost 20 years ago, namely PEP-8. Quoting certain section from the same:

[...] The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. [...]

[...]

Some other good reasons to ignore a particular guideline:

  1. When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
  2. To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).
  3. Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
  4. When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.

I do not think any of these 4 apply in your case, so it is better to follow convention for one-off scripts (and put it into habit). The most common ones being:

  1. variables and functions follow lower_snake_case naming
  2. constants follow UPPER_SNAKE_CASE
  3. classes are UpperCamelCase

Functions

You start off with declaring 2 functions, then everything else is a mess of code all put inside the if __name__ clause. Split this further into functions.

os.path vs pathlib

Since you're using python 3.x, may I suggest looking into a more friendlier inbuilt module for path traversals, called pathlib. The docs provide few examples on how to easily convert your current code utilizing os.path and friends with their equivalent pathlib calls.

File contexts

Python also has a context based file handling. You currently are opening the files explicitly, without ever calling a .close(). Prefer using the following syntax when operating with files:

with open("path/to/file", "mode") as fp:
 fp.write(something)
 # something_else = fp.read()

Comments vs docstrings

Use docstrings instead of comments when describing what a function does. The spec can be found on PEP-257.


PS: A lot of your code appears to be doing unnecessary work due to usage of string operations, instead of proper inbuilt methods. Few examples:

  1. iterating over path string character by character to get extension instead of using os.path.splitext (pathlib has similar functions)
  2. generating path yourself, instead of having a glob based iteration over directory tree
  3. stripping \n from extension, but not really overwriting the variable itself (extension.strip("\n"))
answered Oct 18, 2020 at 4:19
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.