UTF-8 in Python

Asked 14 years, 10 months ago

Viewed 464 times

This seems to be a common question among international developers but I haven't found a straight answer yet. I'm getting from a feed the following string: "Carlos e Carlos mostram o que há de melhor na internet"

The following error is returned to the console: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 31-33: invalid data

thanks in advance,

fbr

Improve this question

asked Feb 15, 2011 at 20:06

ForeignerBR's user avatar

ForeignerBR

2,4694 gold badges24 silver badges28 bronze badges

6

We're unable to see the code you're using, so it's really hard to give a "straight" answer. Also, it's hard to know where you find this "string" and what encoding it uses when you found it. Without any code or any data, there can't be a straight answer.

S.Lott
– S.Lott

2011年02月15日 20:12:58 +00:00
Commented Feb 15, 2011 at 20:12

Add a comment |

1 Answer 1

Sorted by: Reset to default

You can't just decode using some random encoding, even if it is UTF-8; you must decode using the encoding returned in the HTTP headers or an equivalent within the document (such as within the META element of HTML).

If the encoding isn't available or is incorrect then you should specify in the decode operation what will happen on an invalid byte sequence; usually 'replace' suffices for this.

>>> print u'Carlos e Carlos mostram o que há de melhor na internet'.encode('latin1').decode('utf-8', 'replace')
Carlos e Carlos mostram o que h�e melhor na internet

Improve this answer

answered Feb 15, 2011 at 20:11

Ignacio Vazquez-Abrams's user avatar

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

UTF-8 in Python

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related