Python to write multiple dataframes and highlight rows inside an excel file

Question 1

I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.

For example,

writer = pd.ExcelWriter(OutputName) 
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()

Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.

from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("\n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
 j = ReqRows[i] + 1
 xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit() 
for i in range(len(ReqRows_2)):
 j = ReqRows_2[i] + 1
 xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
 j = ReqRows_3[i] + 1
 xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()

At last, I am changing the name of the sheet

xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()

Now there are a few problems here

1) My number of dataframe increases and which means I am writing up the same code again and again.

2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.

Kindly help me with this.

Question 2

Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?

Question 3

First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).

The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:

from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
 arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
 return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
 for row in rows:
 cells = f"A{row+1}:{col}{row+1}"
 sheet.Range(cells).Interior.ColorIndex = 6
 sheet.Columns.AutoFit()
if __name__ == "__main__":
 Emp_ID_df = ...
 writer = pd.ExcelWriter(OutputName)
 Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
 excel_app = Dispatch("Excel.Application")
 workbook = excel_app.Workbooks.Open(OutputName)
 excel_app.visible = False
 sheet_names = ["Sheet1"]
 dfs = [Emp_ID_df]
 for sheet_name, df in zip(sheet_names, dfs):
 sheet = workbook.Sheets(sheet)
 rows = highlight_false(df)
 col = colnum_num_string(df.shape[1])
 color_rows(sheet, rows, col)

However, there is an even easier method using xlsxwriter:

import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
 'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
 nrow, ncol = df.shape
 col_letter = colnum_num_string(ncol + 1)
 cells = f"A1:{col_letter}{nrow+1}"
 sheet.conditional_format(cells, {"type": "cell",
 "criteria": "==",
 "value": 0,
 "format": format1})
writer.save()

You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.

In addition I used Python's official style-guide, PEP8, which recommends using lower_case for functions and variables as well as renaming your variables so they are a lot clearer.

Question 4

1) My number of dataframe increases and which means I am writing up the same code again and again

You're repeating a lot of code which is essentially doing the same thing with just a couple of variables. I think for this you should wrap it in a function. I've taken your code and put it inside a function:

def highlight_false_cells(sheetName, dataFrame, OutputName):
 Pre_Out_df_ncol = dataFrame.shape[1]
 Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
 RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
 arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
 ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
 xlApp = Dispatch("Excel.Application")
 xlwb1 = xlApp.Workbooks.Open(OutputName)
 xlApp.visible = False
 print("\n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
 for i in range(len(ReqRows)):
 j = ReqRows[i] + 1
 xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
 xlwb1.sheets(sheetName).Columns.AutoFit()
 xlwb1.Save()

To call this for your dataframes:

highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)

I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.

I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.

Graipher GraipherGraipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2019-02-22 16:00:47Z

First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).

The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:

from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
 arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
 return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
 for row in rows:
 cells = f"A{row+1}:{col}{row+1}"
 sheet.Range(cells).Interior.ColorIndex = 6
 sheet.Columns.AutoFit()
if __name__ == "__main__":
 Emp_ID_df = ...
 writer = pd.ExcelWriter(OutputName)
 Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
 excel_app = Dispatch("Excel.Application")
 workbook = excel_app.Workbooks.Open(OutputName)
 excel_app.visible = False
 sheet_names = ["Sheet1"]
 dfs = [Emp_ID_df]
 for sheet_name, df in zip(sheet_names, dfs):
 sheet = workbook.Sheets(sheet)
 rows = highlight_false(df)
 col = colnum_num_string(df.shape[1])
 color_rows(sheet, rows, col)

However, there is an even easier method using xlsxwriter:

import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
 'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
 nrow, ncol = df.shape
 col_letter = colnum_num_string(ncol + 1)
 cells = f"A1:{col_letter}{nrow+1}"
 sheet.conditional_format(cells, {"type": "cell",
 "criteria": "==",
 "value": 0,
 "format": format1})
writer.save()

You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.

In addition I used Python's official style-guide, PEP8, which recommends using lower_case for functions and variables as well as renaming your variables so they are a lot clearer.

Stack Exchange Network

Python to write multiple dataframes and highlight rows inside an excel file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python to write multiple dataframes and highlight rows inside an excel file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions