I'm working on a Python application and having some problems handling strings.
There is this string "She’s Out of My League" (without quotes). I stored it in a variable and tried to insert it into an sqlite3 database. But, I get this error:
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
So, I tried to convert the string to unicode. I tried both of these:
new_str = unicode(old_str)
new_str = old_str.encode("utf8")
But this gives me another error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 49: unexpected code byte
I'm stuck here. What am I doing wrong ?
1 Answer 1
Simple. You're assuming that it's UTF-8.
>>> print 'She\x92s Out of My League'.decode('cp1252')
She’s Out of My League
5 Comments
sys.getfilesystemencoding() returns a guess about the filesystem encoding of the current system, and all path functions (e.g. os.path.join, os.listdir) would return unicode (using this guessed encoding) if you give them unicode arguments. Also if you're using cp1252 on a Unix system, you might consider switching to utf8 to avoid bigger issues.
.decodeinstead of.encode.old_str.decode(encoding), and you don't need (in fact, you can't) to encode it back to a bytestring for use with sqlite, sqlite requires unicode.