1

This seems to be a common question among international developers but I haven't found a straight answer yet. I'm getting from a feed the following string: "Carlos e Carlos mostram o que há de melhor na internet"

The following error is returned to the console: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 31-33: invalid data

thanks in advance,

fbr

asked Feb 15, 2011 at 20:06
1
  • 6
    We're unable to see the code you're using, so it's really hard to give a "straight" answer. Also, it's hard to know where you find this "string" and what encoding it uses when you found it. Without any code or any data, there can't be a straight answer. Commented Feb 15, 2011 at 20:12

1 Answer 1

3

You can't just decode using some random encoding, even if it is UTF-8; you must decode using the encoding returned in the HTTP headers or an equivalent within the document (such as within the META element of HTML).

If the encoding isn't available or is incorrect then you should specify in the decode operation what will happen on an invalid byte sequence; usually 'replace' suffices for this.

>>> print u'Carlos e Carlos mostram o que há de melhor na internet'.encode('latin1').decode('utf-8', 'replace')
Carlos e Carlos mostram o que h�e melhor na internet
answered Feb 15, 2011 at 20:11
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.