How can I get b'\xe3\x81\x82' from '\xe3\x81\x82'?
Finally, I want u'\u3042', which means Japanese letter 'あ',
b'\xe3\x81\x82'.decode('utf-8') makes u'\u3042' but
'\xe3\x81\x82'.decode('utf-8') causes the following error
AttributeError: 'str' object has no attribute 'decode'
because b'\xe3\x81\x82' is bytes and '\xe3\x81\x82' is str.
I have DB with data like '\xe3\x81\x82'.
1 Answer 1
If you have bytes disguising as Unicode code points, encode to Latin-1:
'\xe3\x81\x82'.encode('latin1').decode('utf-8')
Latin-1 (ISO-8859-1) maps Unicode codepoints one-on-one to bytes:
>>> '\xe3\x81\x82'.encode('latin1').decode('utf-8')
'あ'
answered Dec 1, 2014 at 12:36
Martijn Pieters
1.1m326 gold badges4.2k silver badges3.5k bronze badges
Sign up to request clarification or add additional context in comments.
Comments
lang-py