I need to call MySQL stored procedure from my python script. As one of parameters I'm passing a unicode string (Russian language), but I get an error;
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
My script:
self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName")
self.cursor=self.db.cursor()
args=("какой-то текст") #this is string in russian
self.cursor.callproc('pr_MyProc', args)
self.cursor.execute('SELECT @_pr_MyProc_2') #getting result from sp
result=self.cursor.fetchone()
self.db.commit()
I've read that setting charset='utf8' shuld resolve this problem, but when I use string:
self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName", charset='utf8')
This gives me another error;
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' in position 20: surrogates not allowed
Also I've trying to set parametr use_unicode=True, that's not working.
8 Answers 8
More things to check on: http://mysql.rjweb.org/doc.php/charcoll#python
Likely items:
- Start code file with
# -*- coding: utf-8 -*--- (for literals in code) - Literals should be u'...'
Can you extract the HEX? какой-то текст should be this in utf8: D0BA D0B0 D0BA D0BE D0B9 2D D182 D0BE D182 20 D0B5 D0BA D181 D182
2 Comments
Here are some thoughts. Maybe not a response. I've been playing with python/mysql/utf-8/unicode in the past and this is the things i remember:
Looking at Saltstack mysql module's comment :
https://github.com/saltstack/salt/blob/develop/salt/modules/mysql.py#L314-L322
# MySQLdb states that this is required for charset usage
# but in fact it's more than it's internally activated
# when charset is used, activating use_unicode here would
# retrieve utf8 strings as unicode() objects in salt
# and we do not want that.
#_connarg('connection_use_unicode', 'use_unicode')
connargs['use_unicode'] = False
_connarg('connection_charset', 'charset')
We see that to avoid altering the result string the use_unicode is set to False, while the charset (which could be utf-8) is set as a parameter. use_unicode is more a 'request' to get responses as unicode strings.
You can check real usage in the tests, here: https://github.com/saltstack/salt/blob/develop/tests/integration/modules/test_mysql.py#L311-L361 with a database named '標準語'.
Now about the message UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' **. You are using **unicode but you tell the module it is utf-8. It is not utf-8 until you encode your unicode string in utf-8.
Maybe you should try with:
args=(u"какой-то текст".encode('utf-8'))
At least in python3 this is required, because your "какой-то текст" is not in utf-8 by default.
1 Comment
The MySQLdb module is not compatible with python 3. That might be why you are getting problems. I would advise to use a different connector, like PyMySQL or mysqlclient.
Related: 23376103.
1 Comment
Maybe you can reload your sys in utf-8 and try to decode the string into utf-8 as following :
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
...
stringUtf8 = u''.join(string_original).decode('utf-8')
Comments
I had a similar problem very recently but with PostgreSQL. After trying tons of suggestions from SO/ internet, I realized the issue was with my database. I had to drop my database and reinstall Postgres, because for some reason it was not allowing me to change the database's default collation. I was in a hurry so couldn't find a better solution, but would recommend the same, since I was only starting my application in the deployment environment. All the best.
Comments
What's your database's charset?
use :
show variables like "characetr%";
or see your database's charset
Comments
I see here two problems.
You have unicode but you try to define it as utf-8 by setting parameter "charset". You should first encode your unicode to utf-8 or another encoding system.
If it however doesn't work, try to do so with init_command='SET NAMES UTF8' parameter.
So it will look like:
conn = MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')
You can try also this:
cursor = db.cursor()
cursor.execute("SET NAMES UTF8;")
Comments
I encountered a similar issue, which was caused by invalid utf-8 data in the database; it seems that MySQL doesn't care about that, but Python does, because it's following the UTF-8 spec, which says that:
- surrogate pairs are not allowed in utf-8
- unpaired surrogates are not allowed in utf-8
If you want to "make it work", you'll have to intercept the MySQL packet and use your own converter which will perform ad-hoc replacements.
Here's one way to "handle" invalid data containing surrogates:
def borked_utf8_decode(data):
"""
Work around input with unpaired surrogates or surrogate pairs,
replacing by XML char refs: look for "&#\d+;" after.
"""
return data.decode("utf-8", "surrogatepass") \
.encode("utf-8", "xmlcharrefreplace") \
.decode("utf-8")
Note that the proper way to handle that is context-dependent, but there are some common replacement scenarios, like this one.
And here's one way of plugging this into pymysql (another way is to monkey-patch field processing, see eg. https://github.com/PyMySQL/PyMySQL/issues/631):
import pymysql.converters
# use this in your connection
pymysql_use_unicode = False
conversions = pymysql.converters.conversions
conversions[pymysql.converters.FIELD_TYPE.STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VAR_STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VARCHAR] = borked_utf8_decode
Comments
Explore related questions
See similar questions with these tags.
args=(u"какой-то текст")?