Ho would I properly encode the following:
# # -*- coding: utf-8 -*-
>>> 'What\x80\x99s Up: Balloon to the Rescue!'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)
>>> 'What\x80\x99s Up: Balloon to the Rescue!'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 4: invalid start byte
asked Feb 27, 2012 at 20:17
David542
112k211 gold badges584 silver badges1.1k bronze badges
2 Answers 2
You've got two issues here. First, your UTF-8 byte sequence is wrong; it should be \xe2\x80\x99. You are also using the wrong function; you need to decode it from UTF-8:
>>> print 'What\xe2\x80\x99s Up: Balloon to the Rescue!'.decode('utf-8')
What’s Up: Balloon to the Rescue!
answered Feb 27, 2012 at 20:21
spencercw
3,35817 silver badges21 bronze badges
Sign up to request clarification or add additional context in comments.
3 Comments
beerbajay
Did you just guess which character OP meant?
spencercw
@beerbajay Yes. Given two of the bytes are the same and it makes perfect sense in the context, I think I'm probably right. ;)
David542
@spencercw Yea, this seems to be a problem upstream. That was the value I had in the database, so I think prior to that I need to encode to utf.
>>> type('What\x80\x99s Up: Balloon to the Rescue!')
<type 'str'>
So you can't encode it since it is not Unicode.
What is your Unicode input?
answered Feb 27, 2012 at 20:21
user647772
Comments
lang-py