This is now my new code implementing the recommendations from my previous question.
import os
import time
import mysql.connector
import MySQLdb
import ConfigParser
import base64
import logging
from logging.handlers import TimedRotatingFileHandler
import sys
import ast
import smtplib
from sendmail import send_mail
"""
TO DO:
1. Retry failure upload when mysql disconnects
"""
currdir = os.getcwd()
curr_datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
"""function that reads the config file"""
def read_Config(section, option):
conf_file = currdir + "\\Config\\config.ini"
config = ConfigParser.ConfigParser()
config.read(conf_file)
sections = config.sections()
def ConfigSectionMap(section):
dict1 = {}
options = config.options(section)
for option in options:
try:
dict1[option] = config.get(section, option)
if dict1[option] == -1:
# print("skip: %s" % option)
LOGGER.debug("skip: %s" % option)
except:
deb = "exception on %s!" % option
LOGGER.debug(deb)
write_log(wyko_no,'DEBUG',deb,cur_datetime)
dict1[option] = None
return dict1
res = ConfigSectionMap(section)[option]
return res
"""read configuration file"""
app_name = read_Config('application','app_name')
FILE_PATH = read_Config('application','file_path')
LOG_PATH = currdir + "\\" + read_Config('application','log_path') + "\\"
USER = read_Config('database','user')
PASSWORD = read_Config('database','password')
PASSWORD = base64.decodestring(PASSWORD) ##decode password, remove D and L from the string
HOST = read_Config('database','host')
DB = read_Config('database','db')
DB_PORT = read_Config('database','port')
RECEIVERS = ast.literal_eval(read_Config('email','receivers'))
SENDER = read_Config('email','sender')
SMTP = read_Config('email','smtp')
PORT = read_Config('email','port')
"""set up logger"""
LOGGER = logging.getLogger(app_name)
LOG_FILE = app_name + '.log'
LOG_PATH = LOG_PATH + LOG_FILE
LOGGER.setLevel(logging.DEBUG)
hdlr = TimedRotatingFileHandler(LOG_PATH, when="d", interval=1, backupCount=1)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s: %(message)s')
hdlr.setFormatter(formatter)
LOGGER.addHandler(hdlr)
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
sh.setFormatter(formatter)
LOGGER.addHandler(sh)
"""insert csv to database"""
def insert_csv(file,filename):
try:
cnx = mysql.connector.connect(user = USER,
password = PASSWORD,host = HOST,
database = DB, port = DB_PORT)
cursor = cnx.cursor()
with open(file, 'rb') as f:
thedata = f.read()
curr_datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
sql = "INSERT INTO wyko_file(wyko_file,file_name,date_inserted) VALUES (%s,%s,%s)"
cursor.execute(sql,(thedata,filename,curr_datetime))
if cursor.rowcount != 0:
LOGGER.info("Transfer to database succesful!")
cnx.commit()
except (MySQLdb.OperationalError, MySQLdb.ProgrammingError), e:
LOGGER.error(e)
send_mail(SENDER,RECEIVERS,e)
finally:
cursor.close()
cnx.close()
"""main loop that watches the directory"""
def main():
try:
for dirpath, dirnames, files in os.walk(FILE_PATH):
for i in files:
file = dirpath+i
try:
LOGGER.info("Transferring file: " +i)
insert_csv(file,i)
except Exception, e:
LOGGER.error(e)
send_mail(SENDER,RECEIVERS,e)
finally:
os.remove(file)
LOGGER.info('File successfully removed from '+FILE_PATH+"\n"+"-"*80)
except Exception, e:
LOGGER.error(e)
send_mail(SENDER,RECEIVERS,e)
# break
if __name__ == "__main__":
LOGGER.info("Application starts running\n"+"-"*80)
main()
But upon deploying this application on a client, I have encountered an issue that this parser is eating up the resources on the client computer and causes applications to hang/lag. I think the code needs optimizing.
The parser is currently deployed on a Windows XP OS with 2GB of RAM.
2 Answers 2
Parsing configuration file
There are a few problems with your read_Config
function:
ConfigParser
objects are akin to dictionaries of dictionaries: there is no need to build dictionaries out of them, they already behave as such with a few extra functionnalities;read_Config
is called (and thus the file parsed) for each parameter you want to retrieve out of the file: this is a waste of resources, read the file once and return the parser object to extract parameters out of it;read
method ofConfigParser
behave the same asopen
, it accepts relative path, no need to usegetcwd
for that;read
method ofConfigParser
actually accept a list of filepaths (a single string is turned into a list of 1 element) and silently handle missing files (rationale in the docs); your code will fail if the file is not handled so I suggest usingwith open(<path>) as config_file: config.readfp(config_file)
instead to get a more descriptive error if anything goes wrong.
Too much "constants"
A lot of your constants are dependant of the result of your read_Config
function.
At this point you may want to introduce a class in your design to improve readability and reusability. The various "constants" initialized out of read_Config
will become the state (attributes) of your object.
Exception handling
You made a good point of handling specific exceptions in the method reviewed in your previous question but forgot to apply the advice in your read_Config
function.
Also you should prefer the new style of variable naming in except
clause using the as
keyword: e.g. except Exception as e
. This is the only way to write it in Python 3 and yours is considered deprecated.
Avoid repetitions
Almost all your except
performs the same operation. For now it’s only two lines of code but someday you might want to improve them. Maybe that the send_mail
function only uses str
on your exception before sending the mail, leaving out the whole stacktrace with usefull informations and at some point you’ll want to improve that.
Use a function for that: any change will be applied anywhere for free.
Bugs?
- What is this
write_log
function inread_Config
? You neither define it nor import it. - In
main
you usefinally
to remove files whether or not they got into the database. You may want to useelse
instead to remove only if the sql insert succeded. - You removed a
while True
from your last question, is it expected?
Misc
- Not sure which way is better but you can get your
curr_daytime
s usingdatetime.now().isoformat(' ')
instead of usingtime
; - Why are you using
ast
to parse the list ofRECEIVERS
? Can't you write it as a string of recipients separated by semi-colons and split it in your code? - Docstrings are "the string written the line right after a function declaration, if any" (not before).
smtplib
,SMTP
andPORT
are never used.
Proposed alternative
import os
from datetime import datetime
from sys import stdout
import logging
import mysql.connector
import MySQLdb
import base64
from ConfigParser import ConfigParser
from sendmail import send_mail
def parse_config(config_filename):
"""function that reads the config file"""
config = ConfigParser()
with open(config_filename) as config_file:
config.readfp(config_file)
return config
class FolderWatcher(object):
def __init__(self, config_filename):
"""Read configuration file and store important data out of it"""
config = parse_config(config_filename)
app_name = config.get('application', 'app_name')
self.logger = logging.getLogger(app_name)
self.logger.setLevel(logging.DEBUG)
formatter = logging.Formatter(
'%(asctime)s - %(name)s - '
'%(levelname)s: %(message)s')
log_path = config.get('application', 'log_path')
handler = logging.handlers.TimedRotatingFileHandler(
os.path.join(log_path, app_name + '.log'),
when="d", interval=1, backupCount=1)
handler.setFormatter(formatter)
self.logger.addHandler(handler)
handler = logging.StreamHandler(stdout)
handler.setLevel(logging.DEBUG)
self.logger.addHandler(handler)
self.watched_folder = config.get('application','file_path')
self.mysql_credentials = {
'user': config.get('database','user'),
##decode password, remove D and L from the string
'password': base64.decodestring(config.get('database','password')),
'host': config.get('database','host'),
'database': config.get('database','db'),
'port': config.get('database','port'),
}
self.mail_sender = config.get('email', 'sender')
# You’ll need to modify your config file to get this line to work
self.mail_receivers = map(str.strip,
config.get('email', 'receivers').split(';'))
self.logger.info("Application {} starts running".format(app_name))
self.logger.info("-"*80)
def manage_exception(self, exception):
self.logger.error(exception)
send_mail(self.mail_sender, self.mail_receivers, exception)
def insert_csv(self, file_path, filename):
"""insert csv to database"""
try:
cnx = mysql.connector.connect(**self.mysql_credentials)
cursor = cnx.cursor()
with open(file_path, 'rb') as f:
data = f.read()
curr_datetime = datetime.now().isoformat(' ')
sql = ("INSERT INTO wyko_file(wyko_file,file_name,date_inserted)"
" VALUES (%s,%s,%s)")
cursor.execute(sql, (data, filename, curr_datetime))
if cursor.rowcount != 0:
self.logger.info("Transfer to database succesful!")
cnx.commit()
except (MySQLdb.OperationalError, MySQLdb.ProgrammingError) as e:
self.manage_exception(e)
finally:
cursor.close()
cnx.close()
def watch(self):
"""main loop that watches the directory"""
try:
for root, dirs, files in os.walk(self.watched_folder):
for filename in files:
file_path = os.path.join(root, filename)
try:
self.logger.info("Transferring file: "+filename)
self.insert_csv(file_path, filename)
except Exception as e:
self.manage_exception(e)
else:
os.remove(file)
self.logger.info(
'File {} successfully removed'.format(file_path))
except Exception as e:
self.manage_exception(e)
if __name__ == "__main__":
watcher = FolderWatcher(os.path.join('Config', 'config.ini'))
watcher.watch()
-
\$\begingroup\$ Thank you very much for the detailed explanation. I was also thinking of creating a class for it, but i dont know where to start. The code looks cleaner now. @Mathias \$\endgroup\$ellaRT– ellaRT2015年11月02日 00:36:51 +00:00Commented Nov 2, 2015 at 0:36
Use batches
One of the issue I can see is that you are handling CSV file line-by-line. You should consider batching your requests to ensure you are sending queries to MySQL batched for better performance. You may be surprised with performance improvement.
See here - also search how to send queries to MySQL in batches: https://stackoverflow.com/a/5526937/2075157
A few comments
Don't hard-code path-separators even if you are planning to run on windows only. See: https://docs.python.org/2/library/os.html#os.pathsep
conf_file = currdir + "\Config\config.ini"
Use
format
function"exception on {0}!".format(option)
instead of%
when formatting strings for output - this is industry standard.
Minor
- formatting is off and not-consistent: like space after comma-separator between method arguments is not consistent - choose one standard and follow it.
Good parts
- logging is done great