I noticed that I've a HUGE bottleneck in this following function of mine. I can't see how to make it faster.
These are the profiling test results(keep in mind that I'm using a PyQt GUI so times can be stretched):
def write_workbook_to_file(self, dstfilename):
self.populaterownumstodelete()
# actual column in the new file
col_write = 0
# actual row in the new file
row_write = 0
for row in (rows for rows in range(self.sheet.nrows) if rows not in self.row_nums_to_delete):
for col in (cols for cols in range(self.sheet.ncols) if cols not in self.col_indexes_to_delete):
self.wb_sheet.write(row_write, col_write, self.parseandgetcellvalue(row, col))
col_write += 1
row_write += 1
col_write = 0
I ran a cProfile
profiling test and write_workbook_to_file()
resulted the slowest function in all my application. parseandgetcellvalue()
isn't a problem at all.
-
\$\begingroup\$ I don't know Python, but it looks like you're iterating through every cell (row & column) in the sheet. That will definitely be slow. Just for the sheer number of iterations. What are trying to accomplish exactly? \$\endgroup\$RubberDuck– RubberDuck2014年12月12日 23:16:52 +00:00Commented Dec 12, 2014 at 23:16
-
1\$\begingroup\$ I have some columns / rows in the original excel file that does not have to be showed in the new file. These informations are stored in col_indexes_to_delete and in row_nums_to_delete. \$\endgroup\$peperunas– peperunas2014年12月12日 23:24:24 +00:00Commented Dec 12, 2014 at 23:24
-
\$\begingroup\$ Why not copy everything (all at once via a Range) and then delete what needs to be deleted, instead of only copying the final dataset? \$\endgroup\$RubberDuck– RubberDuck2014年12月12日 23:26:46 +00:00Commented Dec 12, 2014 at 23:26
-
\$\begingroup\$ I should need to "move" the rows / columns then.. I don't think it's a good idea. I've updated my code now (I'm so tired right now ahah) \$\endgroup\$peperunas– peperunas2014年12月12日 23:34:21 +00:00Commented Dec 12, 2014 at 23:34
-
\$\begingroup\$ I've replaced the newest code block with the original one. There's no need for multiple of them. \$\endgroup\$Jamal– Jamal2014年12月12日 23:39:25 +00:00Commented Dec 12, 2014 at 23:39
1 Answer 1
To make it faster you could move this
(cols for cols in range(self.sheet.ncols) if cols not in self.col_indexes_to_delete)
out of the loop, and use enumerate
instead of explicitly incrementing variables.
Revised code:
def write_workbook_to_file(self, dstfilename):
self.populaterownumstodelete()
rows = [row for row in xrange(self.sheet.nrows) if row not in self.row_nums_to_delete]
cols = [col for col in xrange(self.sheet.ncols) if col not in self.col_indexes_to_delete]
for row_write, row in enumerate(rows):
for col_write, col in enumerate(cols):
self.wb_sheet.write(row_write, col_write, self.parseandgetcellvalue(row, col))
In fact you could even move enumerate
out of the loop, but I like how the code looks quite clean now.
-
\$\begingroup\$ Fantastic solution! Thanks for showing me enumerate()! It's just PERFECT used here! \$\endgroup\$peperunas– peperunas2014年12月14日 17:27:01 +00:00Commented Dec 14, 2014 at 17:27
Explore related questions
See similar questions with these tags.