Details:
I have a code, which does the following:
- Get data from CSV and insert it into MySQL table
- Get data from SQL Server and insert it into MySQL table
- Run this code every 240 seconds - to refresh the data, in order to have up to date infromation.
I'd like to know whether the code below, I am about to provide, can be improved? In terms of performance and whether the code is readable or anything can be changed.
import pyodbc
import csv
import mysql.connector
import time
#MySQL connection
MySQLdb = mysql.connector.connect(
host="SVR",
user="root",
passwd="",
database="production"
)
#SQL Sevrer
sqlServerData = pyodbc.connect(
"Driver={SQL Server Native Client 11.0};"
"Server=svr;"
"Database=SQL_Server;"
"Trusted_Connection=yes;")
input("Press Enter to continue...")
starttime=time.time()
while True:
try:
print("---START---")
#MySQL cursor
MySQLtruncateCSV = MySQLdb.cursor()
MySQLtruncateSQL_Server = MySQLdb.cursor()
MySQLcursor = MySQLdb.cursor()
MySQLcursor2 = MySQLdb.cursor()
#SQL Server Cursor
SqlData = sqlServerData.cursor()
#Truncate MySQL Tables
CSVTruncate = 'Truncate CSV;'
SQL_ServerTruncate = 'Truncate SQL_Server;'
print("Truncate Table CSV...")
#CSV table
MySQLtruncateCSV.execute(CSVTruncate)
MySQLtruncateCSV.close()
MySQLdb.commit()
print("Truncate Table SQL_Server...")
#SQL_Server table
MySQLtruncateSQL_Server.execute(SQL_ServerTruncate)
MySQLtruncateSQL_Server.close()
MySQLdb.commit()
print("Receiving data from CSV...")
print("Sending to MySQL..")
#Insert CSV file into MySQL database
with open('G:/Technical/Labels/Production/Data/CSVExport.csv', 'r') as f:
data = csv.reader(f)
next(data, None) #Skip header
for row in data:
MySQLcursor.execute('insert into CSV (customer_code,customer_logo, product_code,product_description,allergen_info, barcode_inner,barcode_outer) VALUES (%s,%s,%s,%s,%s,%s,%s);', row)
MySQLdb.commit()
MySQLcursor.close()
print("Receiving SQL_Server data...")
#Get data from SQL_Server
SqlData.execute("select p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name \
from salesorderline sol join \
salesorder so \
on sol.salesorderid = so.id join \
product p \
on sol.productid = p.id join \
customer c \
on so.customerid = c.id \
where so.orderdate > dateadd(dd,-10,cast(getdate() as date));")
print("Sending to MySQL..")
#Send SQL_Server data into MySQL
for x in SqlData.fetchall():
a,b,c,d,e,f,g = x
MySQLcursor2.execute("insert into SQL_Server (product_id, product_code, product_description, product_weight, \
salesorder_number, customer_code, customer_name) values (%s,%s,%s,%s,%s,%s,%s);", (a,b,c,d,e,f,g))
SqlData.close()
MySQLdb.commit()
MySQLcursor2.close()
print("---END---")
time.sleep(240 - ((time.time() - starttime) % 240))
except:
print("An error has occured.. Please contact Technical")
break
2 Answers 2
The exception handling is useless as it is currently implemented, because you are not retrieving the full details. Are you going to guess what went wrong ? Add this import at the top of your code:
import traceback
And then you can use:
print(f'Exception occured: {traceback.format_exc()}')
which will yield more details.
What would be good is having a counter, or keeping track of the latest row ID processed, so you can tell if a row in particular is causing problems.
I am not sure I would have done a script like this, because row by row insertion is slower than a bulk insert, for example in MySQL that would be LOAD DATA INFILE
. For this you need a user with the FILE privilege. In a controlled (private) environment, this is okay, otherwise think about the security implications.
Naming conventions: CSV
is not a good name for a table, think of something more meaningful, even for a temp table:
CSVTruncate = 'Truncate CSV;'
SQL_Server
is a terrible name too for a table.
Certain constant values should be defined as variables and kept at the top of the code eg: 'G:/Technical/Labels/Production/Data/CSVExport.csv'
Structure is not great, the program could be more readable. Don't do all the stuff in the main procedure, instead separate functionality by moving code to functions. The CSV import should definitely be a standalone function. That would make the program easier to read and understand, and reduce the risk of confusion and bugs.
You need to add more line spacing too. For example this code is not pleasant to read:
CSVTruncate = 'Truncate CSV;'
SQL_ServerTruncate = 'Truncate SQL_Server;'
print("Truncate Table CSV...")
#CSV table
MySQLtruncateCSV.execute(CSVTruncate)
MySQLtruncateCSV.close()
MySQLdb.commit()
print("Truncate Table SQL_Server...")
#SQL_Server table
MySQLtruncateSQL_Server.execute(SQL_ServerTruncate)
You reuse variable names and there is real potential for confusion as exemplified here:
#SQL_Server table
MySQLtruncateSQL_Server.execute(SQL_ServerTruncate)
With names like these it's not immediately clear against which environment you are really running queries. So you should really break up your code in a few dedicated functions, and not mix functionality.
And if your goal is to insert data from SQL Server to Mysql the approach is not ideal I think. It is possible to interconnect different DBMSes with each other. For example with SQL server you can connect to other databases (linked servers), for this you need to install the right drivers and middleware.
Then it is possible to insert data from one table to another, even to another database and server. It is matter of choice, but the idea is worth considering. The choice is whether to spend time on development (+ maintenance & fixing bugs) or spend time on integration.
SQL performance: this may be the least of your worries now but this query could be improved:
SqlData.execute("select p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name \
from salesorderline sol join \
salesorder so \
on sol.salesorderid = so.id join \
product p \
on sol.productid = p.id join \
customer c \
on so.customerid = c.id \
where so.orderdate > dateadd(dd,-10,cast(getdate() as date));")
product p \
on sol.productid = p.id join \
customer c \
on so.customerid = c.id \
where so.orderdate > dateadd(dd,-10,cast(getdate() as date));")
The where
clause will not advantage of an index on orderdate
if there is one. Simply calculate D-10 in your Python code and pass a hard value to your query rather than an expression.
from datetime import datetime, timedelta
d = datetime.today() - timedelta(days=10)
# this will return: '2020-04-11'
d.strftime('%Y-%m-%d')
When you are doing joins on multiple tables that have lots of records, you can experience performance issues, especially when not taking advantage of the indexes that exist.
Externalize your connection strings
This information:
host="SVR",
user="root",
passwd="",
database="production"
"Driver={SQL Server Native Client 11.0};"
"Server=svr;"
"Database=SQL_Server;"
"Trusted_Connection=yes;")
'G:/Technical/Labels/Production/Data/CSVExport.csv'
should not be baked into your program, for multiple reasons:
- configurability and maintainability
- security
Put these into a configuration file, a command-line argument, or an environmental variable.
Typo
#SQL Sevrer
-> #SQL Server
Context management
Doing a search through the MySQL Connector Python source code as well as their bug tracker, it seems that the library has a deficiency where cursors cannot be used as context managers. You can do the next-best thing: try
/finally
whenever you make a connection or cursor that needs to be closed, or (probably better) make your own small context manager utilities for such cases.
The situation for pyodbc
seems to be better. Connections and cursors should be used in with
statements.
In all cases you should prefer this to explicit close()
calls.
Multi-line strings
Since this is SQL:
SqlData.execute("select p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name \
from salesorderline sol join \
salesorder so \
on sol.salesorderid = so.id join \
product p \
on sol.productid = p.id join \
customer c \
on so.customerid = c.id \
where so.orderdate > dateadd(dd,-10,cast(getdate() as date));")
Indentation does not matter. Your current string literal includes indentation, and it might as well stay that way but losing the continuation escapes and using a multi-line string:
SqlData.execute("""
select p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name
from salesorderline sol
join salesorder so on sol.salesorderid = so.id
join product p on sol.productid = p.id
join customer c on so.customerid = c.id
where so.orderdate > dateadd(dd,-10,cast(getdate() as date));
""")
I also think it is clearer and more logical to line-break before the join
keyword rather than after, and include the on
with its corresponding join
.
Unpacking
This:
a,b,c,d,e,f,g = x
is not helpful. Either give these meaningful names, or don't unpack at all:
MySQLcursor2.execute("""
insert into SQL_Server (
product_id, product_code, product_description, product_weight,
salesorder_number, customer_code, customer_name
) values (%s,%s,%s,%s,%s,%s,%s);
""",
x
)
Magic numbers and home-rolled time math
Do not do this:
time.sleep(240 - ((time.time() - starttime) % 240))
It's difficult to understand. I guess 240 seconds is 4 minutes. You're
- finding the elapsed time since the start,
- modulating that by 4 minutes? Why?
- subtracting that from 4 minutes.
At a wild guess, what you are trying to do is "wait until 4 minutes have passed since the start of the program", which would actually require
from datetime import datetime
from time import sleep
start_time = datetime.now()
# ...
elapsed = datetime.now() - start_time
until_4m = (timedelta(minutes=4) - elapsed).total_seconds()
if until_4m > 0:
sleep(until_4m)