3

I have a mysql db. I set charset to utf8;

...
 PRIMARY KEY (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 | 
...

I connect to db in python with MySQLdb;

conn = MySQLdb.connect(host = "localhost",
 passwd = "12345",
 db = "db",
 charset = 'utf8',
 use_unicode=True)

When I execute a query, response is decoding with "windows-1254". Example response;

curr = conn.cursor(MySQLdb.cursors.DictCursor)
select_query = 'SELECT * FROM users'
curr.execute(select_query)
for ret in curr.fetchall():
 username = ret["username"]
 print "repr-username; ", repr(username)
 print "username; "username.encode("utf-8")
...

output is;

repr-username; u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username; ÅŸÃ1⁄4krÃ1⁄4çaÄŸlÃ1⁄4li

When I print username with "windows-1254" it works fine;

...
print "repr-username; ", repr(username)
print "username; ", username.encode("windows-1254")
...

Output is;

repl-username; u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username; şükrüçağlüli

When I try it with some other characters like cyrillic alphabet, decodeding is changed dinamicly. How can I prevent it?

asked Aug 28, 2014 at 12:50
7
  • to be clear, "şükrüçağlüli" is the output you want? Commented Aug 28, 2014 at 12:55
  • Yes. This text has some turkish special characters like "şüçğ". Commented Aug 28, 2014 at 12:57
  • Is that the charset of the table as well? Commented Aug 28, 2014 at 13:08
  • 2
    Terminal encoding ? Other idea: could you modify your test case to both INSERT and SELECT from Python. Does the problem persist ? Commented Aug 28, 2014 at 13:12
  • 1
    On my UTF-8 system u"şükrüçağlüli" == u'\u015f\xfckr\xfc\xe7a\u011fl\xfcli'. This is not what you have. Are you certain the data have been properly encoded at INSERT time ? Commented Aug 28, 2014 at 13:19

1 Answer 1

3

I think the items where encoded wrong while INSERT to the database.

I recommend python-ftfy(from https://github.com/LuminosoInsight/python-ftfy) (helped me out in a simillar problem):

import ftfy
username = u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
print ftfy.fix_text(username) # outputs şükrüçağlüli
answered Aug 28, 2014 at 13:43
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.