2

I have been trying to export a large pandas dataframe using DataFrame.to_sql to a MySQL database, but the dataframe has unicode characters in some columns, some of which cause warnings during export and are converted to ?.

I managed to reproduce the issue with this example (database login removed):

import pandas as pd
import sqlalchemy
import pymysql
engine = sqlalchemy.create_engine('mysql+pymysql://{}:{}@{}/{}?charset=utf8'.format(*login_info), encoding='utf-8')
df_test = pd.DataFrame([[u'\u010daj',2], \
 ['čaj',2], \
 ['špenát',4], \
 ['květák',7], \
 ['kuře',1]], \
 columns = ['a','b'])
df_test.to_sql('test', engine, if_exists = 'replace', index = False, dtype={'a': sqlalchemy.types.UnicodeText()})

The first two rows of the dataframe should be the same, just defined differently.

I get the following warning, and the problematic characters (č, ě, ř) are rendered as ?:

/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 1")
 result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 2")
 result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x9Bt\\xC3\\xA1k' for column 'a' at row 4")
 result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC5\\x99e' for column 'a' at row 5")
 result = self._query(query)

with the resulting database table test looking like this:

a b
?aj 2
?aj 2
špenát 4
kv?ták 7
ku?e 1

Curiously, the ž, š and á characters (and others in my full dataset) are processed correctly, so it seems to only affect a subset of unicode characters. As you can see above, I also tried setting utf-8 wherever I could (engine, DataFrame.to_sql) with no effect.

asked Aug 10, 2017 at 16:13

2 Answers 2

2

pymysql:

import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306,
 user='root', passwd='******',
 charset="utf8mb4")

sqlalchemy:

 db_url = sqlalchemy.engine.url.URL(drivername='mysql', host=foo.db_host,
 database=db_schema,
 query={ 'read_default_file' : foo.db_config, 'charset': 'utf8mb4' })

See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored Explanation of ?:

  • The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is UTF-8.

(Note: The CHARACTER SETs utf8 and utf8mb4 are interchangeable for European languages.)

These are Czech characters?

answered Aug 18, 2017 at 0:54
Sign up to request clarification or add additional context in comments.

Comments

0

I haved met the same problem,use pymysql drive as well.

I change mysql drive to mysql-connector,1366 Warning disappear

install mysql-connector drive

pip install mysql-connector

sqlalchemy engine setting like this

create_engine('mysql+mysqlconnector://root:tj1996@localhost:3306/new?charset=utf8mb4')

answered Jul 12, 2018 at 14:54

2 Comments

LookupError: unknown encoding: utf8mb4
@Pyd - utf8mb4 was introduced in MySQL 5.5; upgrade. Or maybe the "connector" needs to be upgraded.@Pyd

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.