I'm trying to pull data from SQL Server using pyodbc and load it into a dataframe, then export it to an HTML file, except I keep receiving the following Unicode error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 15500: ordinal not in range(128)
Here is my current setup (encoding instructions per docs):
cnxn = pyodbc.connect('DSN=Planning;UID=USER;PWD=PASSWORD;')
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='cp1252', to=unicode)
cnxn.setencoding(str, encoding='utf-8')
cnxn.setencoding(unicode, encoding='utf-8')
cursor = cnxn.cursor()
with open('Initial Dataset.sql') as f:
initial_query = f.read()
cursor.execute(initial_query)
columns = [column[0] for column in cursor.description]
initial_data = cursor.fetchall()
i_df = pd.DataFrame.from_records(initial_data, columns=columns)
i_df.to_html('initial.html')
An odd but useful point to note is that when I try to export a CSV:
i_df.to_csv('initial.csv')
I get the same error, however when I add:
i_df.to_csv('initial.csv', encoding='utf-8')
It works. Can someone help me understand this encoding issue?
Side note: I've also tried using a sqlalchemy connection and pandas.read_sql() and the same error persists.
1 Answer 1
The second answer on this question seems to be an acceptable workaround, except for Python 2.x users, you must use io, so:
import io
html = df.to_html()
with io.open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
It was not included in the latest release, but it looks like the next version of pandas will have an encoding option for to_html(), see docs (line 2228).
5 Comments
to_html() seems to enforce ASCII encoding. It appears they will be fixing that issue in an upcoming release.to_html() seems to enforce ASCII encoding" - No, more likely that to_html uses the default encoding for the file when you only pass it a string (filepath) for buf=, and the default string encoding for Python_2 is ASCII.buf argument that is a StringIO-like object instead of just a (string) path.
utf-8encoded. The docs forpandas.to_htmlare rather scant. Why would it try to convert to ASCII when generating the HTML?setencoding/setdecodingcalls at all when working with SQL Server, especially not encoding to UTF-8, which SQL Server ODBC does not use (it uses UTF-16, and that is the default encoding for pyodbc).