This program's main function is to:
- Watch a directory.
- If there are new files, insert those as entries into the database.
- Delete the files from the directory.
My below code runs well, but I want to optimize it and I think it needs better error handling.
import os
import time
import mysql.connector
import MySQLdb
import pooop
file_path = 'C:\\path\\to\\watch\\'
#send files to database
def insert_csv(file,filename):
try:
cnx = mysql.connector.connect(user='user',
password ='pass',host='192.168.1.1',
database='test')
cursor = cnx.cursor()
thedata = open(file, 'rb').read()
sql = "INSERT INTO testing(file_backup,file_name) VALUES (%s,%s)"
cursor.execute(sql,(thedata,filename))
cnx.commit()
except MySQLdb.OperationalError, e:
print e
except MySQLdb.ProgrammingError, e:
print e
finally:
cursor.close()
def walk():
try:
for dirpath, dirnames, files in os.walk(file_path):
for i in files:
file = dirpath+i
try:
insert_csv(file,i)
except Exception, e:
print e
finally:
os.remove(file)
print 'File successfully removed\n'+"-"*80
except Exception, e:
print e
#main loop that watches the directory
if __name__ == "__main__":
print "Application starts running\n"+"-"*80
while True:
walk()
The application will run on Windows XP. I'm using Python 2.7.
-
\$\begingroup\$ Please do not edit your question that way as you invalidated a whole lot of @vnp answer in doing so. See What you may and may not do after receiving answers. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2015年10月27日 11:45:48 +00:00Commented Oct 27, 2015 at 11:45
-
\$\begingroup\$ I did not edit the question, i edited the code to apply what vnp suggested. If it isnt allowed to apply the suggestion on the answer i will revert it back. @mathias \$\endgroup\$ellaRT– ellaRT2015年10月27日 11:56:40 +00:00Commented Oct 27, 2015 at 11:56
-
\$\begingroup\$ I see. Okay. Thank you. Ill do that. @mathias. \$\endgroup\$ellaRT– ellaRT2015年10月27日 12:11:39 +00:00Commented Oct 27, 2015 at 12:11
2 Answers 2
For the error handling I agree that there are improvements possible, in particular it seems dangerous to me that if a database error occurs, the corresponding file will be deleted regardless. Apart from that other exceptions should be fine, e.g. if a file couldn't be deleted it will just be retried later.
- Take a look at PEP8 for formatting. In particular constants should be written in upper case.
- The usual advice is also to not shadow predefined names like
file
. - The comments on
insert_csv
and the main loop should rather be a docstrings because that way you could look at that information interactively. - The way the entries are read from the file is leaking the file
descriptor - use
with
onopen
. - To prevent too much escaping try the
r""
format (assuming that the final backslash isn't necessary). - Catching exceptions should be done with forward compatibility to
Python 3 in mind, i.e.
except Exception as e
instead of the comma syntax; same goes forprint
, which should use the function syntax. - The connection should be closed too.
- Catching exceptions this way hides the backtrace, that is often less useful than just letting the exception go through.
insert_csv
opens and closes the database connection for every file. That is suboptimal and I can't see a reason why it shouldn't just stay open - did some of these exceptions clobber the whole connection? I don't understand all of the implications yet, but I'd suggest moving theconnect
and possiblycursor
calls intomain
instead ofinsert_csv
.
Some ideas for future improvements:
- Settings and parameters like the database connection and the watched directory should be moved into configuration and/or command line arguments.
- Perhaps use sqlalchemy or something with a bit nicer interface.
- Similarly, try to find some library to watch the directory without
polling, e.g.
watchdog
, because the current busy loop is really expensive comparatively.
This is a bit cleaner I believe:
import os
import time
import mysql.connector
import MySQLdb
import pooop
FILE_PATH = r'C:\path\to\watch'
DB_CONNECTION = {
'user': 'user',
'password': 'pass',
'host': '192.168.1.1',
'database': 'test'
}
LINE = "-" * 80
def insert_csv(file, filename):
"Send files to database."
try:
cnx = mysql.connector.connect(**DB_CONNECTION)
cursor = cnx.cursor()
with open(file, 'rb') as f:
thedata = f.read()
sql = "INSERT INTO testing(file_backup, file_name) VALUES (%s, %s)"
cursor.execute(sql, (thedata, filename))
cnx.commit()
except (MySQLdb.OperationalError, MySQLdb.ProgrammingError) as e:
print(e)
finally:
cursor.close()
cnx.close()
def walk(file_path):
try:
for dirpath, dirnames, files in os.walk(file_path):
for i in files:
file = dirpath + i
try:
insert_csv(file, i)
except Exception as e:
print(e)
finally:
os.remove(file)
print('File successfully removed')
print(LINE)
except Exception as e:
print(e)
def main():
"Main loop that watches the directory."
print("Application starts running")
print(LINE)
while True:
walk(FILE_PATH)
if __name__ == "__main__":
main()
Naming
Nothing in the code suggests that
csv
files are special. Why a backup is calledbackup_csv
?Exception handling
The main code removes the file regardless of the insertion success. For example, if
connect
fails, the file is still removed. Is this an intended behaviour? I recommend to callos.remove()
insidebackup_csv
.misc
What is the purpose of
time.sleep(1)
?I don't see a reason for
if files
clause. Thefor i in files
works as expected iffiles
is empty.
-
\$\begingroup\$ I have revised the code, for naming, i rename the backup_csv for exception handling : if a
connect
fails the behaviour i want to achieve is for the program to try again to insert the file instead of removing it. @vnp \$\endgroup\$ellaRT– ellaRT2015年10月27日 07:35:46 +00:00Commented Oct 27, 2015 at 7:35