What is the correct way to convert '\xbb' into a unicode string? I have tried the following and only get UnicodeDecodeError:
unicode('\xbb', 'utf-8')
'\xbb'.decode('utf-8')
-
It is part of a file that someone pasted from Word (so its a str). If you type print u'\xbb' you get the double arrow (>>) character.Jason Christa– Jason Christa2011年03月21日 21:50:30 +00:00Commented Mar 21, 2011 at 21:50
3 Answers 3
Since it comes from Word it's probably CP1252.
>>> print '\xbb'.decode('cp1252')
»
Comments
It looks to be Latin-1 encoded. You should use:
unicode('\xbb', 'Latin-1')
Comments
Not sure what you are trying to do. But in Python3 all strings are unicode per default. In Python2.X you have to use u'my unicode string \xbb' (or double, tripple quoted) to get unicode strings. When you want to print unicode strings you have to encode them in character set that is supported on the output device, eg. the terminal. u'my unicode string \xbb'.endoce('iso-8859-1') for instance.