12

I have always used win32com module in my development server to easily convert from xlsx to pdf:

o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
o.DisplayAlerts = False
wb = o.Workbooks.Open("test.xlsx")))
wb.WorkSheets("sheet1").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, "test.pdf")
o.Quit()

However, I have deployed my Django app in production server where I don't have Excel application installed and it raises the following error:

File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\__init__.p
y", line 95, in Dispatch
 dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,c
lsctx)
 File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 114, in _GetGoodDispatchAndUserName
 return (_GetGoodDispatch(IDispatch, clsctx), userName)
 File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 91, in _GetGoodDispatch
 IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.II
D_IDispatch)
com_error: (-2147221005, 'Invalid class string', None, None)

Is there any good alternative to convert from xlsx to PDF in Python?

I have tested xtopdf with PDFWriter, but with this solution you need to read and iterate the range and write lines one by one. I wonder if there is a more direct solution similar to win32com.client.

Thanks!

asked Sep 14, 2018 at 7:17
5
  • Possible duplicate of .xlsx and xls(Latest Versions) to pdf using python Commented Sep 14, 2018 at 7:27
  • 1
    It's not, the exception is different. By the way, that post has not been solved. Commented Sep 14, 2018 at 7:32
  • yeah it is... you have a portability issue, and that thread lists all viable options for converting xlsx to pdf. btw OP answered their own question. Commented Sep 14, 2018 at 7:46
  • Yes, he asnwered with the same solution that didn't work for me. And as it's not the same problem that's why I opened a new thread. Commented Sep 14, 2018 at 8:04
  • Also very related to convert excel to pdf in python. Commented May 2, 2020 at 11:12

4 Answers 4

12

As my original answer was deleted and is eventually a bit useful, I repost it here.

You could do it in 3 steps:

  1. excel to pandas: pandas.read_excel
  2. pandas to HTML: pandas.DataFrame.to_html
  3. HTML to pdf: python-pdfkit (git), python-pdfkit (pypi.org)
import pandas as pd
import pdfkit
df = pd.read_excel("file.xlsx")
df.to_html("file.html")
pdfkit.from_file("file.html", "file.pdf")

install:

sudo pip3.6 install pandas xlrd pdfkit
sudo apt-get install wkhtmltopdf 
answered May 4, 2020 at 7:25
Sign up to request clarification or add additional context in comments.

4 Comments

This is very useful. Is it possible to do this in windows.? How can I install wkhtmltopdf in windows.?
Would this work on an excel file with images, formulas and formatting?
@SAndrew yes it works on windows >> wkhtmltopdf.org/downloads.html
FYI: This shows all empty cells as NaN and empty column names as Unnamed: 1, 2,3.. in pandas==2.0.3. Use df.to_html(na_rep="") to remove the NaN and pattern = r'Unnamed:\s\d+' regex to remove the Unnamed
3

This is a far more efficient method than trying to load a redundant script that is hard to find and was wrtten in Python 2.7.

  1. Load excel spread sheet into a DataFrame
  2. Write the DataFrame to a HTML file
  3. Convert the html file to an image.

 dirname, fname = os.path.split(source)
 basename = os.path.basename(fname)
 data = pd.read_excel(source).head(6)
 css = """
 """
 text_file = open(f"{basename}.html", "w")
 # write the CSS
 text_file.write(css)
 # write the HTML-ized Pandas DataFrame
 text_file.write(data.to_html())
 text_file.close()
 imgkitoptions = {"format": "jpg"}
 imgkit.from_file(f"{basename}.html", f'{basename}.png', options=imgkitoptions)
 try:
 os.remove(f'{basename}.html')
 except Exception as e:
 print(e)
 return send_from_directory('./', f'{basename}.png')

Taken from here https://medium.com/@andy.lane/convert-pandas-dataframes-to-images-using-imgkit-5da7e5108d55

Works really well, I have XLSX files converting on the fly and displaying as image thumbnails on my application.

miken32
42.5k16 gold badges127 silver badges177 bronze badges
answered Jun 11, 2019 at 12:39

1 Comment

If posting code in an answer you should include the imports.
3

I'm using Linux, so I couldn't try pywin32. So I found unoserver with LibreOffice

import subprocess
def convert_xlsx_to_pdf(xlsx_file):
 try:
 subprocess.run(["libreoffice", "--headless", "--convert-to", "pdf", xlsx_file])
 print("Done!")
 except Exception as e:
 print("Error:", e)
convert_xlsx_to_pdf("file.xlsx")
answered Jul 4, 2023 at 16:58

2 Comments

This also worked for me. My fear is that my deployment package might be quite large due to the size of libreoffice. But this works absolutely and the images and image positions are maintained. Thank you @Omar
this wont work on VPS, thats why people abandon SO because of answer like this
2
from openpyxl import load_workbook
from PDFWriter import PDFWriter
workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active
pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')
ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
 s = ''
 for cell in row:
 if cell.value is None:
 s += ' ' * 11
 else:
 s += str(cell.value).rjust(10) + ' '
 pw.writeLine(s)
pw.savePage()
pw.close()

I have been using this and it works fine

answered Sep 14, 2018 at 7:33

5 Comments

Yeah, I have tested it and it works fine. The things is that you need to iterate the range to write lines in PDF. I was looking for a equivalent solution to win32com where you can choose the sheet you want to export.
does the library PDFWriter still exist?
Anybody knows how to install this PDFWriter? I am not able to find any links. Is pdfrw the same module?
@ArindamRoychowdhury bitbucket.org/vasudevram/xtopdf/downloads download the repo and its in there but i still cant get it going...
@ArindamRoychowdhury Nice, I ended up using Pandas, read_excel(), then converted the dataframe to HTML, then used imgkit to turn the html table into a PNG

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.