I have a function called USB()
, that will observe a USB stick's insertion and delete files inside of it, if there are some:
import os
import shutil
# 1. Check and clean USB
def USB():
usb_inserted = os.path.isdir("F:") # <-- returns boolean...checks whether a directory exists
if usb_inserted == False:
print("\n\nUSB stick is not plugged in. Waiting for connection...")
while usb_inserted == False: # wait
""" updating variable, because it takes only the return value of function 'isdir()'
and stays the same regardless of the fact, that 'isdir()' changed """
usb_inserted = os.path.isdir("F:")
continue
SECURITY_FILE = "System Volume Information"
if os.listdir("F:") == [SECURITY_FILE] or os.listdir("F:") == []: # if list of files contains only the security file (is empty)
print("\nUSB flash is already empty.") # send a message and continue
else:
files = os.listdir("F:") # list of names of files in the usb flash, that will be deleted
if SECURITY_FILE in files:
""" taking out the security file from the list, because attempting to delete it causes
'PermissionError [WinError 5]' exception """
files.remove(SECURITY_FILE)
for file in files: # Loop through the file list
if os.path.isfile(f"F:\\{file}"): # if it's a file
os.remove(f"F:\\{file}") # Delete the file
elif os.path.isdir(f"F:\\{file}"): # if it's a directory/folder
shutil.rmtree(f"F:\\{file}") # remove the folder
print("\nAll files/folders are deleted from USB.")
USB()
Is there anything, that can be improved in terms of cleaner code or things that I could have missed while testing?
6 Answers 6
I am a bit late to the party but I would like to reinforce the already stated comments: polling is a bad approach. You should be listening to system events so the program should be idle most of the time. Using watchdog would be an improvement but you can do better.
There are libraries for USB in Python, for example pyusb, pyudev, libusb and there is also the DBUS library.
See for example: How can I listen for 'usb device inserted' events in Linux, in Python?
The catch here is that you using Windows.
There is an assumption that the USB stick shall be mounted as F: but it may not always be true, in particular if another USB storage device is already plugged in and has claimed that letter.
The approach seems to quite dangerous as well, basically your script will wipe the contents of any USB key that satisfies your lax criteria. A more prudent approach would be to look at the serial number of the USB stick, and use a whitelist of serial numbers to decide if the stick being inserted should be wiped or not by your script.
Deleting files one by one is not efficient if you have a lot (by the way, is there a trashbin for those files ?). If the goal is to remove sensitive files then maybe doing a quick format or clearing the partition would do the job as fine.
If you're paranoid and don't want files to be restorable using forensic methods this will likely not suffice, then you're looking at the Windows equivalent of the shred command in Linux.
Boolean Checks
Instead of doing:
if usb_inserted == False:
# do something
use
if not usb_inserted:
# do something
while not usb_inserted
This while loop will hammer away at a file system call to check if a drive exists. I'd check once every few seconds instead:
from time import sleep
def wait_for_usb():
while True:
if os.path.isdir('F:'):
break
else:
# give user notification it's still waiting
print('Detecting...')
sleep(2)
Checking List Contents
To check for an empty list, you can rely on it's truthiness instead of an explicit equality check:
# instead of this:
if os.listdir("F:") == []:
# do something
# do this
if not os.listdir('F:'):
# do something
Furthermore, it would be better to do that operation only once:
# This might be a better check for the security file
def only_contains_security_file(contents):
return len(contents) == 1 and contents[0] == "System Volume Information"
# only grab contents once
contents = os.listdir('F:')
if not contents or only_contains_security_file(contents):
print("\nUSB flash is already empty.")
Iterating over files
Since you just want to do a top-level walk of your drive, you can use os.scandir
instead of iterating over os.listdir
. This has a few advantages. One being that it doesn't aggregate everything into a list
(a disadvantage that os.walk
has for this use as well) and that it automatically allows you to check types:
for item in os.scandir('F:'):
if item.name.endswith("System Volume Information"):
continue
elif item.is_file():
os.remove(item.path)
else:
shutil.rmtree(item.path)
This is also useful if your usb is full of a ton of files with long names.
You also don't need to check if it's empty, your loop will just skip everything if it is.
Adding if __name__ == "__main__"
guard
This is useful for scripts that you may want to run, but also want to import things from. With this guard in place, if you import things from the usb module from another python script, it won't execute your script. This is nice considering you have an infinite loop:
def usb():
# your code
if __name__ == "__main__":
usb()
So altogether:
import os
import shutil
from time import sleep
def wait_for_usb():
print("Waiting for usb at F:")
while True:
if os.path.isdir('F:'):
break
else:
print("Detecting...")
sleep(2)
def usb():
wait_for_usb()
for item in os.scandir('F:'):
if item.name.endswith("System Volume Information"):
print('Skipping system info file')
continue
elif item.is_file():
print(f"Removing file {item.path}")
os.remove(item.path)
else:
print(f"Removing directory {item.path}")
shutil.rmtree(item.path)
print('USB is now empty')
if __name__ == "__main__":
usb()
-
15\$\begingroup\$ Busy waits like that, occupying a full CPU core merely to monitor for a change, are generally considered pretty terrible. I suppose it'd be adequate for a simple code demonstration in school, but if I saw it in a production application, I'd think very poorly of its author. The computer has other things to do; users hate for a program to use an entire core just to wait for something beyond its control to happen. Ideally, you want to register with the OS for an event, but if you can't, at least pause for half a second or more between checks. Minimize that CPU time. \$\endgroup\$Corrodias– Corrodias2021年08月13日 15:40:33 +00:00Commented Aug 13, 2021 at 15:40
-
\$\begingroup\$ @Corrodias, is that realted to me or Niv's
wait_for_usb()
function? lol \$\endgroup\$Tronzen– Tronzen2021年08月13日 16:35:54 +00:00Commented Aug 13, 2021 at 16:35 -
\$\begingroup\$ @C.Nivs we don't need
if __name__ == "__main__"
, because this function will be with other 2 functions (I'll post about them later) and they will be used in another .py file. \$\endgroup\$Tronzen– Tronzen2021年08月13日 17:06:07 +00:00Commented Aug 13, 2021 at 17:06 -
5\$\begingroup\$ @user394299 Really either implementation of waiting for the usb is bad, so they probably mean both. Mine is an improvement to keep the loop from hammering away at the CPU in a way that degrades performance. Reinderien is right, there are better production quality ways to do this \$\endgroup\$C.Nivs– C.Nivs2021年08月13日 17:09:01 +00:00Commented Aug 13, 2021 at 17:09
-
9\$\begingroup\$ @user394299 Both are suboptimal, yes, but waiting some time between checks is considerably lighter on resource use than a tight loop is. I expect the best performance would come from registering a drive insertion event handler with the Win32 API, but that's awfully complex to do within Python. This question seems to go into some detail on that. stackoverflow.com/questions/38689090 \$\endgroup\$Corrodias– Corrodias2021年08月13日 20:01:44 +00:00Commented Aug 13, 2021 at 20:01
It is recommended to follow PEP8 for readability and maintainability. The code violates many guidelines. Consider running the code through pylint
and pycodestyle
.
Blank Lines
USB
only has one blank line above it.
Blank Lines
Surround top-level function and class definitions with two blank lines.
Line length
Maximum Line Length
Limit all lines to a maximum of 79 characters.
For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.
There are many lines that contain more than 79 characters.
Constants
It appears there is only one constant used by the function: SECURITY_FILE
Constants
Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include
MAX_OVERFLOW
andTOTAL
.
Move SECURITY_FILE
to the top of the module. While it may never need to be updated, if a change was required then it would be easier to find instead of hunting through the file.
-
1\$\begingroup\$ Perhaps add some suggestions for some tools to check for PEP8 violations? Like Pycodestyle and Pylint (presuming the latter can actually do it) \$\endgroup\$Peter Mortensen– Peter Mortensen2021年08月15日 11:00:32 +00:00Commented Aug 15, 2021 at 11:00
-
1\$\begingroup\$ Strictly conforming to PEP8 for line length is, honestly, overly proscriptive. Even
psf/black
, the "uncompromising" formatter, uses a max length of 88, and allows user configuration. I would move the focus away from the actual number of characters there, and more toward the idea of breaking up some lines (e.g. moving comments above/below the described line). \$\endgroup\$Schism– Schism2021年08月15日 21:18:37 +00:00Commented Aug 15, 2021 at 21:18
A production-grade solution would not have a polling loop at all, and would rely on a filesystem change notification hook. There are many, many ways to do this with varying degrees of portability and ease of use; if I were you I'd use something portable and off-the-shelf like watchdog.
Making a program to delete everything on a USB stick is dangerous. Consider listing current contents and asking for confirmation, to make sure you don't accidentally wipe the wrong stick or drive.
-
\$\begingroup\$ In the main program I have to delete the previous temporary PDF files, so I made some changes in
os.remove()
part:if item.is_file() and item.name.startswith("tmp") and item.name.endswith(".pdf"):
os.remove(item.path)
\$\endgroup\$Tronzen– Tronzen2021年08月30日 07:31:14 +00:00Commented Aug 30, 2021 at 7:31
I read the reviews (thanks to everybody, who was supporting) and did some testing, which helped me to make an improved clean version...I guess
# 1.5 find the usb drive letter (path)
def find_drive_id():
local_machine_connection = wmi.WMI()
DRIVE_SERIAL_NUMBER = "88809AB5" # serial number is like an identifier for a storage device determined by system
'''.Win32.LogicalDisk returns list of wmi_object objects, which include information
about all storage devices and discs (C:, D:)'''
for storage_device in local_machine_connection.Win32_LogicalDisk():
if storage_device.VolumeSerialNumber == DRIVE_SERIAL_NUMBER: # if a device with matching serial number was found
return storage_device.DeviceID
return False
# 1. Observe and clean usb flash
def USB():
while not find_drive_id():
input("\nUSB stick is not found. Enter it and press [ENTER] to try again...")
drive_id = find_drive_id()
if os.listdir(drive_id) == ["System Volume Information"] or not os.listdir(drive_id):
print(f"\nUSB drive {drive_id} is already empty.")
else:
entry_iterator = os.scandir(drive_id) # iterator/generator of DirEntry objects
for item in entry_iterator:
if item.is_file(): # if it's a file
os.remove(item.path) # Delete the file
elif item.is_dir(): # if it's a directory/folder
shutil.rmtree(item.path) # remove the folder
print(f"\nAll files/folders are deleted from USB drive {drive_id}.")