4
\$\begingroup\$

I have an excel file with 20+ separate sheets containing tables of data. My script iterates through each sheet, manipulates the data into the format I want it and then saves it to a final output file. I think it can be improved, and I've flagged sections in the code with "review this" which I think I've done more work than I've needed to. Any feedback or criticism would be awesome!

import openpyxl
import pandas as pd
path = 'C:/Desktop/Python/Excel Manipulation/'
wb = openpyxl.load_workbook(path + 'inputfile.xlsx')
sheets = wb.get_sheet_names()
CSVList = []
for sheet in sheets:
 #get the current active sheet
 active_sheet = wb.get_sheet_by_name(sheet)
 #count numbers of rows
 row_count = active_sheet.get_highest_row() - 1
 #count number of columns
 column_count = active_sheet.get_highest_column()
 count = 0
 values = []
 #write each row to a list, stop when reached max rows (REVIEW THIS - would have thought there was a better way than using a counter)
 while count <= row_count: 
 for i in active_sheet.rows[count]:
 values.append(i.value) 
 count = count + 1
 #split values list into tuples based on number of columns 
 split_rows = zip(*[iter(values)]*column_count)
 #convert list of tuples to list of lists (REVIEW THIS - creating a tuple and then converting to list seems like extra work?!?)
 rows = [list(elem) for elem in split_rows]
 #get elements of file and store (REVIEW THIS - looks messy?)
 title = rows.pop(0)[0]
 headers = rows.pop(0)
 headers[1] = 'Last Year'
 rows.pop(0) 
 #create pandas dataframe
 df = pd.DataFrame(rows, columns=headers)
 #take header_id and remove to normalise the data
 header_id = headers.pop(2)
 normalise_data = pd.melt(df, id_vars=header_id, value_vars=headers, var_name='Measure', value_name='Value') 
 normalise_data.insert(0, 'Subject', title) 
 CSVList.append(normalise_data)
frame = pd.concat(CSVList)
frame.to_csv(path + 'CSV Outputs/' + 'final.csv', sep=',', index=False)
asked May 14, 2015 at 11:27
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Found a better way to iterate over the rows (although I still feel like I'm repeating myself!):

for idx, row in enumerate(active_sheet.rows): 
 for i in active_sheet.rows[idx]:
 values.append(i.value)

And instead of converting the tuple to a list:

rows = [list(t) for t in zip(*[iter(values)]*column_count)]
answered May 14, 2015 at 15:33
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Why not for i in row on your second for loop there? \$\endgroup\$ Commented Aug 28, 2015 at 16:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.