1
\$\begingroup\$

I noticed that I've a HUGE bottleneck in this following function of mine. I can't see how to make it faster.

These are the profiling test results(keep in mind that I'm using a PyQt GUI so times can be stretched):

cProfile Results

def write_workbook_to_file(self, dstfilename):
 self.populaterownumstodelete()
 # actual column in the new file
 col_write = 0
 # actual row in the new file
 row_write = 0
 for row in (rows for rows in range(self.sheet.nrows) if rows not in self.row_nums_to_delete):
 for col in (cols for cols in range(self.sheet.ncols) if cols not in self.col_indexes_to_delete):
 self.wb_sheet.write(row_write, col_write, self.parseandgetcellvalue(row, col))
 col_write += 1
 row_write += 1
 col_write = 0

I ran a cProfile profiling test and write_workbook_to_file() resulted the slowest function in all my application. parseandgetcellvalue() isn't a problem at all.

200_success
146k22 gold badges190 silver badges479 bronze badges
asked Dec 12, 2014 at 23:10
\$\endgroup\$
6
  • \$\begingroup\$ I don't know Python, but it looks like you're iterating through every cell (row & column) in the sheet. That will definitely be slow. Just for the sheer number of iterations. What are trying to accomplish exactly? \$\endgroup\$ Commented Dec 12, 2014 at 23:16
  • 1
    \$\begingroup\$ I have some columns / rows in the original excel file that does not have to be showed in the new file. These informations are stored in col_indexes_to_delete and in row_nums_to_delete. \$\endgroup\$ Commented Dec 12, 2014 at 23:24
  • \$\begingroup\$ Why not copy everything (all at once via a Range) and then delete what needs to be deleted, instead of only copying the final dataset? \$\endgroup\$ Commented Dec 12, 2014 at 23:26
  • \$\begingroup\$ I should need to "move" the rows / columns then.. I don't think it's a good idea. I've updated my code now (I'm so tired right now ahah) \$\endgroup\$ Commented Dec 12, 2014 at 23:34
  • \$\begingroup\$ I've replaced the newest code block with the original one. There's no need for multiple of them. \$\endgroup\$ Commented Dec 12, 2014 at 23:39

1 Answer 1

3
\$\begingroup\$

To make it faster you could move this

(cols for cols in range(self.sheet.ncols) if cols not in self.col_indexes_to_delete)

out of the loop, and use enumerate instead of explicitly incrementing variables.

Revised code:

def write_workbook_to_file(self, dstfilename):
 self.populaterownumstodelete()
 rows = [row for row in xrange(self.sheet.nrows) if row not in self.row_nums_to_delete]
 cols = [col for col in xrange(self.sheet.ncols) if col not in self.col_indexes_to_delete]
 for row_write, row in enumerate(rows):
 for col_write, col in enumerate(cols):
 self.wb_sheet.write(row_write, col_write, self.parseandgetcellvalue(row, col))

In fact you could even move enumerate out of the loop, but I like how the code looks quite clean now.

answered Dec 13, 2014 at 10:29
\$\endgroup\$
1
  • \$\begingroup\$ Fantastic solution! Thanks for showing me enumerate()! It's just PERFECT used here! \$\endgroup\$ Commented Dec 14, 2014 at 17:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.