This is the first project I have coded using Python.
Therefore, I am looking for someone to give me some comments to further improve my code.
Purpose of the application
This application help the staff in my organization to import the spreadsheet and store it into the database.
Flow
- User would need to provide the document location before the application execute.
- System will extract the information from spreadsheet and import into the database.
Source Code
Below is my source code.
The link below is the complete source code on Github, https://github.com/WeeHong/0703-Extractor/blob/master/Main.ipynb
# Establish database connection
database = mysql.connector.connect(
host = 'localhost',
user = 'root',
password = '',
database = 'projects_0703'
)
# More about cursor:
# https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor.html
mycursor = database.cursor()
# Fetch all companies records from the companies table from database
mycursor.execute('SELECT id, name FROM companies')
companies = mycursor.fetchall()
# Spreadsheet location
# location = ("./Cluster.xlsx")
location = input()
# Open the spreadsheet from the given path based on the sheet index
workbook = xlrd.open_workbook(location)
sheet = workbook.sheet_by_index(3)
# Fetch all the value from the sheet and store into an array
for excel_row in range(2, sheet.nrows):
# Check respective fields are empty
# Cell 26 = Techsector
# Cell 27 = Sub-techsector
if sheet.cell_value(excel_row, 26):
category = categories[sheet.cell_value(excel_row, 26)]
else:
category = None
if sheet.cell_value(excel_row, 27):
subcategory = subcategories[sheet.cell_value(excel_row, 27)]
else:
subcategory = None
# Assign ID = 10 if the stage is 0 in spreadsheet
if sheet.cell_value(excel_row, 0) == 0:
stage = 10
else:
stage = sheet.cell_value(excel_row, 0)
# Replace NA into 0 for Manday
if sheet.cell_value(excel_row, 19) == 'NA':
manday = 0
else:
manday = sheet.cell_value(excel_row, 19)
# Check if workbook's company exists in database company
# Yes, fetch the company ID
# No, create new company record and fetch the company ID
if sheet.cell_value(excel_row, 1) in companies:
for company in companies:
if sheet.cell_value(excel_row, 1) == company[0][1]:
company_id = company[0][0]
else:
if sheet.cell_value(excel_row, 3):
# Get the industry ID from industries dictionary
industry_id = industries[sheet.cell_value(excel_row, 3)]
mycursor.execute('INSERT INTO companies(industry_id, name, category, country, source, created_at, updated_at) VALUES (%s, %s, %s, %s, %s, NOW(), NOW())', (industry_id, sheet.cell_value(excel_row, 1), 'a:1:{i:0;s:6:"Client";}', 'SINGAPORE', '0703',),)
company_id = mycursor.lastrowid
database.commit()
# Create new project record
mycursor.execute('INSERT INTO projects(company_id, source_id, stage_id, service_id, leader_id, category_id, subcategory_id, name, revenue, forecast, quotation, remarks, created_at, updated_at) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())', (company_id, sources[sheet.cell_value(excel_row, 4)], stage, services[sheet.cell_value(excel_row, 5)], staffs[sheet.cell_value(excel_row, 20)], category, subcategory, sheet.cell_value(excel_row, 2), sheet.cell_value(excel_row, 17), sheet.cell_value(excel_row, 16), sheet.cell_value(excel_row, 18), sheet.cell_value(excel_row, 28)),)
project_id = mycursor.lastrowid
database.commit()
# Check the number of project member in charge of the project
# Create new project member record
if sheet.cell_value(excel_row, 21):
members = sheet.cell_value(excel_row, 21).split(',')
for member in members:
mycursor.execute('INSERT INTO member_project(member_id, project_id, manday, created_at, updated_at) VALUES (%s, %s, %s, NOW(), NOW())', (staffs[member.strip()], project_id, manday),)
database.commit()
# Create new project techpartner record
# Cell 22 = Techpartner 1
# Cell 23 = Techpartner 2
# Cell 24 = Techpartner 3
# Cell 25 = Techpartner 4
for excel_cell in range(22, 26):
techpartner = sheet.cell_value(excel_row, excel_cell)
if techpartner:
if techpartner in companies:
for company in companies:
if techpartner == company[0][1]:
techpartner_id = company[0][0]
else:
mycursor.execute('INSERT INTO companies(name, category, country, source, created_at, updated_at) VALUES (%s, %s, %s, %s, NOW(), NOW())', (techpartner, 'a:1:{i:0;s:7:\"Partner\";)', 'SINGAPORE', '0703',),)
techpartner_id = mycursor.lastrowid
database.commit()
mycursor.execute('INSERT INTO project_techpartners(project_id, techpartner_id, created_at, updated_at) VALUES (%s, %s, NOW(), NOW())', (project_id, techpartner_id),)
database.commit()
```
1 Answer 1
The code isn't unreasonable. There are ways to clean it up, though.
Functions
Organize your code into sub-routines - perhaps one to load categories from your spreadsheet, one to write information to your database, etc.
Magic numbers
Numbers like 26, 27, 19, etc. should be assigned to constants, to make the code easier to understand.
Ineffectual commit
The last line of your code is a commit after no operation. Perhaps this is just an indentation error and you meant to commit after your execute
.
Delete the commit before it in the else
. More broadly, you may want to consider reducing the number of commits in your code, maybe even to one at the end. This depends on a number of things, including how you want to deal with errors and whether they effect validity of the data as a whole, as well as performance.
Finally, have a read through https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-autocommit.html . The way you're using commits now, you're better off just turning on autocommit.
Combined inserts
Your insert into member_project
is inefficient. You should not insert in a loop. Read https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-executemany.html
Add a prompt to your input()
Otherwise, the user doesn't know why the program has suddenly hung.
-
\$\begingroup\$ Thank you. I will look into it and improve my code. Much appreciated. \$\endgroup\$Wee Hong KOH– Wee Hong KOH2019年08月28日 04:06:12 +00:00Commented Aug 28, 2019 at 4:06
workbook = xlrd.open_workbook(location)
Where isxlrd
defined? Please include your imports. \$\endgroup\$