5
\$\begingroup\$

I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.

For example,

writer = pd.ExcelWriter(OutputName) 
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()

Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.

from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("\n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
 j = ReqRows[i] + 1
 xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit() 
for i in range(len(ReqRows_2)):
 j = ReqRows_2[i] + 1
 xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
 j = ReqRows_3[i] + 1
 xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()

At last, I am changing the name of the sheet

xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()

Now there are a few problems here

1) My number of dataframe increases and which means I am writing up the same code again and again.

2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.

Kindly help me with this.

301_Moved_Permanently
29.4k3 gold badges48 silver badges98 bronze badges
asked Feb 22, 2019 at 14:45
\$\endgroup\$
1

2 Answers 2

6
\$\begingroup\$

First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).

The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:

from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
 arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
 return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
 for row in rows:
 cells = f"A{row+1}:{col}{row+1}"
 sheet.Range(cells).Interior.ColorIndex = 6
 sheet.Columns.AutoFit()
if __name__ == "__main__":
 Emp_ID_df = ...
 writer = pd.ExcelWriter(OutputName)
 Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
 excel_app = Dispatch("Excel.Application")
 workbook = excel_app.Workbooks.Open(OutputName)
 excel_app.visible = False
 sheet_names = ["Sheet1"]
 dfs = [Emp_ID_df]
 for sheet_name, df in zip(sheet_names, dfs):
 sheet = workbook.Sheets(sheet)
 rows = highlight_false(df)
 col = colnum_num_string(df.shape[1])
 color_rows(sheet, rows, col)

However, there is an even easier method using xlsxwriter:

import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
 'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
 nrow, ncol = df.shape
 col_letter = colnum_num_string(ncol + 1)
 cells = f"A1:{col_letter}{nrow+1}"
 sheet.conditional_format(cells, {"type": "cell",
 "criteria": "==",
 "value": 0,
 "format": format1})
writer.save()

You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.

In addition I used Python's official style-guide, PEP8, which recommends using lower_case for functions and variables as well as renaming your variables so they are a lot clearer.

answered Feb 22, 2019 at 16:00
\$\endgroup\$
3
\$\begingroup\$

1) My number of dataframe increases and which means I am writing up the same code again and again

You're repeating a lot of code which is essentially doing the same thing with just a couple of variables. I think for this you should wrap it in a function. I've taken your code and put it inside a function:

def highlight_false_cells(sheetName, dataFrame, OutputName):
 Pre_Out_df_ncol = dataFrame.shape[1]
 Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
 RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
 arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
 ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
 xlApp = Dispatch("Excel.Application")
 xlwb1 = xlApp.Workbooks.Open(OutputName)
 xlApp.visible = False
 print("\n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
 for i in range(len(ReqRows)):
 j = ReqRows[i] + 1
 xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
 xlwb1.sheets(sheetName).Columns.AutoFit()
 xlwb1.Save()

To call this for your dataframes:

highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)

I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.

I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.

answered Feb 22, 2019 at 16:18
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.