I have made a script that syncs one folder to another. So if, for example, you saved some changes in a file in directory A
, then that file would have the same contents om directory B
if you set it up.
It runs on Python 3 and the watchdog
module. If you are wishing to try it out (and please do), make sure to install watchdog
with pip install watchdog
.
Here is the code:
import logging
import pathlib
from watchdog.observers import Observer # pip install watchdog
from watchdog.events import FileSystemEventHandler # PyPI: https://pypi.org/project/watchdog/
import time
# Thanks @Tarik and @joelhed on Stack Overflow
# https://stackoverflow.com/questions/62501333/
# Set logging level
logging.basicConfig(level=logging.INFO,
format="%(asctime)s - %(levelname)s:\t%(message)s",
datefmt="%Y-%m-%d %H:%M:%S")
# List of files to sync to
# Example: [["move", [path (string), path]],
# ["create", path],
# ["delete", path],
# ["modify", path]]
changes = []
# Define the main application code
class FolderSyncer(object):
def __init__(self, src, dst):
# Define paths
self.source_path = src
self.destination_path = dst
logging.debug(f"\nSource path:\t\t{self.source_path}\n"
f"Destination path:\t{self.destination_path}")
# Make a file system observer
self.observer = Observer()
# Schedule it with our EventHandler(), the path, and recursive
self.observer.schedule(EventHandler(), str(self.source_path), recursive=True)
# If we are completely synced
self.synced = False
def __enter__(self):
logging.debug("Entered the Folder Syncer")
# Start the observer
self.observer.start()
# Must return self for context managers
return self
def run(self):
while True:
if len(changes) > 0:
# Remove first change from queue
change = changes.pop(0)
# We are still handling changes, so we are not synced
self.synced = False
logging.debug(f"Handling {change[0]} from {change[1]}")
# Handle change here, pretend to do something
if change[0] == "move":
(self.destination_path / change[1][0].replace(str(self.source_path), "")).replace(
self.destination_path / change[1][1].replace(str(self.source_path), "")
)
elif change[0] == "create":
# If it's a file
if pathlib.Path(change[1]).is_file():
# Write the file's contents
(self.destination_path / change[1].replace(str(self.source_path), "")).write_bytes(
pathlib.Path(change[1]).read_bytes()
)
# Else, it's a directory
else:
(self.destination_path / change[1].replace(str(self.source_path), "")).mkdir(exist_ok=True)
elif change[0] == "delete":
try:
# Try to remove as file
(self.destination_path / change[1].replace(str(self.source_path), "")).unlink()
except PermissionError:
# It's a directory, so remove it as a directory
(self.destination_path / change[1].replace(str(self.source_path), "")).rmdir()
elif change[0] == "modify":
try:
(self.destination_path / change[1].replace(str(self.source_path), "")).write_bytes(
pathlib.Path(change[1]).read_bytes()
)
except PermissionError:
pass
logging.info(f"Finished handling {change[0]} from {change[1]}, {len(changes)} changes left!")
else:
if not self.synced:
self.synced = True
logging.info("You are all completely synced!")
time.sleep(1)
def __exit__(self, exc_type, exc_value, traceback):
logging.warning("Exited the Folder Syncer")
# Stop the observer
self.observer.stop()
# Join the observer to the current thread
self.observer.join()
# Define an event handler
class EventHandler(FileSystemEventHandler):
def on_moved(self, event):
super(EventHandler, self).on_moved(event)
what = "directory" if event.is_directory else "file"
logging.debug(f"Moved {what}: from {event.src_path} to {event.dest_path}")
changes.append(["move", [event.src_path, event.dest_path]])
def on_created(self, event):
super(EventHandler, self).on_created(event)
what = "directory" if event.is_directory else "file"
logging.debug(f"Created {what}: {event.src_path}")
changes.append(["create", event.src_path])
def on_deleted(self, event):
super(EventHandler, self).on_deleted(event)
what = "directory" if event.is_directory else "file"
logging.debug(f"Deleted {what}: {event.src_path}")
changes.append(["delete", event.src_path])
def on_modified(self, event):
super(EventHandler, self).on_modified(event)
what = "directory" if event.is_directory else "file"
logging.debug(f"Modified {what}: {event.src_path}")
changes.append(["modify", event.src_path])
with FolderSyncer(pathlib.Path(r"U:"), pathlib.Path(r"F:\USB 64GB sync")) as folder_syncer:
folder_syncer.run()
You will most likely want to change the directory you want to sync and the location. To do that, scroll down to the bottom and change the parameters in the pathlib.Path()
objects. For example, if you want to change the directory you want to sync D:
to E:
, than you would change:
with FolderSyncer(pathlib.Path(r"U:"), pathlib.Path(r"F:\USB 64GB sync")) as folder_syncer:
folder_syncer.run()
to
with FolderSyncer(pathlib.Path(r"D:"), pathlib.Path(r"E:")) as folder_syncer:
folder_syncer.run()
I would appreciate any code optimizations, bug-squashing, clean-ups (It's quite messy, even with classes), and some security fixes are good too. Performance is a plus, but I would like to keep it readable too.
Thanks in advance!
(̶O̶h̶,̶ ̶a̶n̶d̶ ̶B̶T̶W̶,̶ ̶w̶h̶y̶ ̶i̶s̶ ̶̶f̶o̶l̶d̶e̶r̶
̶ ̶n̶o̶t̶ ̶a̶ ̶t̶a̶g̶?̶)̶ Totally didn't forget that folder means directory*. Thanks Reinderien
*sarcasm
1 Answer 1
Python 3 classes
You should omit (object)
as the base class for classes in Python 3.
Re-entrance
changes
is a global that's mutated by FolderSyncer
, so immediately this is neither re-entrant nor thread-safe. Maybe move the changes list to a member of FolderSyncer
.
Anonymous tuples
Your changes
has a few other issues:
- The inner lists should be tuples, because - though the outer list changes - the inner items do not.
- The inner lists have implied positional significance - item 0 is assumed to be an operation string and item 1 is assumed to be a path. This should be replaced with a class (maybe a
@dataclass
) with operation and path members. - The operation, being stringly-typed, has no guarantees about constraint to valid values. It should be replaced with an
Enum
.
Pathlib
There's a lot to unpack here:
(self.destination_path / change[1][0].replace(str(self.source_path), "")).replace(
self.destination_path / change[1][1].replace(str(self.source_path), "")
)
Let's first reformat it so that it's legible by humans:
source_path, dest_path = change[1]
(
self.destination_path
/ (
source_path
.replace(str(self.source_path), "")
)
).replace(
self.destination_path
/ (
dest_path
.replace(str(self.source_path), "")
)
)
I can only be half-sure that I got that right. That one-liner should be unpacked into probably at least five separate statements, with well-named temporary variables. Otherwise, this is madness.
Further, you're doing a mix of pathlib
(good) and string manipulation (not good). Attempt to avoid str()
and replace()
, and use the path manipulation functions from pathlib
to extract what you need.
Imports
Rather than writing pathlib.Path
all the time, consider just from pathlib import Path
.
Sleep?
If you're doing this:
time.sleep(1)
because the console window will disappear if you don't, then there are better solutions that don't pollute your code and hang the program for your user. The reason I'm guessing this is why you sleep is that you have Windows-style example paths.
-
1\$\begingroup\$ Thank you for your answer! I will work on this tommorow (If I have time). The only reason I have a
time.sleep(1)
is that to reduce CPU usage. If it doesn't have a big impact, I'll remove it. \$\endgroup\$Unsigned_Arduino– Unsigned_Arduino2020年06月22日 03:34:20 +00:00Commented Jun 22, 2020 at 3:34
file-system
captures it well enough. \$\endgroup\$