Split excel file with multiple sheets, manipulate the data and create final out file

Question 1

I have an excel file with 20+ separate sheets containing tables of data. My script iterates through each sheet, manipulates the data into the format I want it and then saves it to a final output file. I think it can be improved, and I've flagged sections in the code with "review this" which I think I've done more work than I've needed to. Any feedback or criticism would be awesome!

import openpyxl
import pandas as pd
path = 'C:/Desktop/Python/Excel Manipulation/'
wb = openpyxl.load_workbook(path + 'inputfile.xlsx')
sheets = wb.get_sheet_names()
CSVList = []
for sheet in sheets:
 #get the current active sheet
 active_sheet = wb.get_sheet_by_name(sheet)
 #count numbers of rows
 row_count = active_sheet.get_highest_row() - 1
 #count number of columns
 column_count = active_sheet.get_highest_column()
 count = 0
 values = []
 #write each row to a list, stop when reached max rows (REVIEW THIS - would have thought there was a better way than using a counter)
 while count <= row_count: 
 for i in active_sheet.rows[count]:
 values.append(i.value) 
 count = count + 1
 #split values list into tuples based on number of columns 
 split_rows = zip(*[iter(values)]*column_count)
 #convert list of tuples to list of lists (REVIEW THIS - creating a tuple and then converting to list seems like extra work?!?)
 rows = [list(elem) for elem in split_rows]
 #get elements of file and store (REVIEW THIS - looks messy?)
 title = rows.pop(0)[0]
 headers = rows.pop(0)
 headers[1] = 'Last Year'
 rows.pop(0) 
 #create pandas dataframe
 df = pd.DataFrame(rows, columns=headers)
 #take header_id and remove to normalise the data
 header_id = headers.pop(2)
 normalise_data = pd.melt(df, id_vars=header_id, value_vars=headers, var_name='Measure', value_name='Value') 
 normalise_data.insert(0, 'Subject', title) 
 CSVList.append(normalise_data)
frame = pd.concat(CSVList)
frame.to_csv(path + 'CSV Outputs/' + 'final.csv', sep=',', index=False)

Question 2

Found a better way to iterate over the rows (although I still feel like I'm repeating myself!):

for idx, row in enumerate(active_sheet.rows): 
 for i in active_sheet.rows[idx]:
 values.append(i.value)

And instead of converting the tuple to a list:

rows = [list(t) for t in zip(*[iter(values)]*column_count)]

Question 3

Why not for i in row on your second for loop there?

ashleh ashleh 1972 gold badges3 silver badges8 bronze badges · Answer 1 · 2015-05-14 15:33:37Z

Found a better way to iterate over the rows (although I still feel like I'm repeating myself!):

for idx, row in enumerate(active_sheet.rows): 
 for i in active_sheet.rows[idx]:
 values.append(i.value)

And instead of converting the tuple to a list:

rows = [list(t) for t in zip(*[iter(values)]*column_count)]

2

\$\begingroup\$ Why not for i in row on your second for loop there? \$\endgroup\$

SuperBiasedMan
– SuperBiasedMan

2015年08月28日 16:31:31 +00:00
Commented Aug 28, 2015 at 16:31

Stack Exchange Network

Split excel file with multiple sheets, manipulate the data and create final out file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Split excel file with multiple sheets, manipulate the data and create final out file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions