I am trying to work with sqlite on python:
from pysqlite2 import dbapi2 as sqlite
con = sqlite.connect('/home/argon/super.db')
cur = con.cursor()
cur.execute('select * from notes')
for i in cur.fetchall():
print i[2]
And I sometimes get something like this (I am from Russia):
Ответ etc...
And if I pass this string to this function(it helped me in other projects):
def unescape(text):
def fixup(m):
text = m.group(0)
if text[:2] == "&#":
# character reference
try:
if text[:3] == "&#x":
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
else:
# named entity
try:
text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
except KeyError:
pass
return text # leave as is
return re.sub("&#?\w+;", fixup, text)
I get even more weird result:
ÐÑÐ2ÐμÑÐ ̧ÑÑ Ñ ÑÐ ̧ÑÐ ̧ÑÐ3⁄4Ð2аÐ1⁄2Ð ̧ÐμÐ1⁄4 etc
What should I do to get normal Cyrillic symbols?
asked Oct 13, 2012 at 20:52
scythargon
3,5015 gold badges38 silver badges68 bronze badges
1 Answer 1
О looks like a UTF-8 byte pair for \xD0\x9E, or \u1054. Better known as the cyrillic character О (Capital O).
In other words, you have strangely encoded UTF-8 data on your hand. Turn the { digits into bytes (chr(208) would do) then decode from UTF-8:
>>> (chr(208) + chr(158)).decode('utf-8')
u'\u1054'
>>> print (chr(208) + chr(158)).decode('utf-8')
О
>>> print (chr(208) + chr(158) + chr(209) + chr(130) + chr(208) + chr(178)).decode('utf-8')
Отв
answered Oct 13, 2012 at 20:58
Martijn Pieters
1.1m326 gold badges4.2k silver badges3.5k bronze badges
Sign up to request clarification or add additional context in comments.
Comments
lang-py