Python decoding Unicode is not supported

Question 1

I am having a problem with my encoding in Python. I have tried different methods but I can't seem to find the best way to encode my output to UTF-8.

This is what I am trying to do:

result = unicode(google.searchGoogle(param), "utf-8").encode("utf-8")

searchGoogle returns the first Google result for param.

This is the error I get:

exceptions.TypeError: decoding Unicode is not supported

Does anyone know how I can make Python encode my output in UTF-8 to avoid this error?

Question 2

Looks like google.searchGoogle(param) already returns unicode:

>>> unicode(u'foo', 'utf-8')
Traceback (most recent call last):
 File "<pyshell#1>", line 1, in <module>
 unicode(u'foo', 'utf-8')
TypeError: decoding Unicode is not supported

So what you want is:

result = google.searchGoogle(param).encode("utf-8")

As a side note, your code expects it to return a utf-8 encoded string so what was the point in decoding it (using unicode()) and encoding back (using .encode()) using the same encoding?

Question 3

Honestly, the unicode() was just fooling around trying to understand what was happening. Thank you very much :-)

Question 4

Now I will sometimes get ascii' codec can't decode byte 0xc3 in position. Do you know why that is?

Question 5

In the line I suggested? Then it would mean that searchGoogle() returned a string with 0xC3 byte. Calling .encode() on that results in Python trying to convert to unicode first (using ascii encoding). I don't know why searchGoogle() would sometimes return unicode and sometimes a string. Maybe it depends on what you give it in param? Try to stick to one type.

Question 6

I wish there was a safe, simple way to cast to unicode.

Question 7

@EricWalker You could write an awkward helper function like def uors2u(object, encoding=..., errors=...) which will return object param unchanged if it is already in Unicode or convert it if str. However, this code smells. You should be converting all input to Unicode as soon as you receive it from the outside (like a file system) and converting it back if needed before sending it back. There should be only one place where you convert str to unicode, so a helper function like the one I described should not be needed.

yak 9,1012 gold badges35 silver badges25 bronze badges · Accepted Answer · 2011-10-03 12:09:04Z

102

Looks like google.searchGoogle(param) already returns unicode:

>>> unicode(u'foo', 'utf-8')
Traceback (most recent call last):
 File "<pyshell#1>", line 1, in <module>
 unicode(u'foo', 'utf-8')
TypeError: decoding Unicode is not supported

So what you want is:

result = google.searchGoogle(param).encode("utf-8")

As a side note, your code expects it to return a utf-8 encoded string so what was the point in decoding it (using unicode()) and encoding back (using .encode()) using the same encoding?

Share

Improve this answer

answered Oct 3, 2011 at 12:09

yak's user avatar

yak

9,1012 gold badges35 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

simonbs

simonbs Over a year ago

Honestly, the unicode() was just fooling around trying to understand what was happening. Thank you very much :-)

2011年10月04日T06:25:54.88Z+00:00

simonbs

simonbs Over a year ago

Now I will sometimes get ascii' codec can't decode byte 0xc3 in position. Do you know why that is?

2011年10月04日T09:05:54.66Z+00:00

yak

yak Over a year ago

In the line I suggested? Then it would mean that searchGoogle() returned a string with 0xC3 byte. Calling .encode() on that results in Python trying to convert to unicode first (using ascii encoding). I don't know why searchGoogle() would sometimes return unicode and sometimes a string. Maybe it depends on what you give it in param? Try to stick to one type.

2011年10月05日T10:37:53.007Z+00:00

Eric Walker

Eric Walker Over a year ago

I wish there was a safe, simple way to cast to unicode.

2014年10月21日T00:45:21.28Z+00:00

Leonid

Leonid Over a year ago

@EricWalker You could write an awkward helper function like def uors2u(object, encoding=..., errors=...) which will return object param unchanged if it is already in Unicode or convert it if str. However, this code smells. You should be converting all input to Unicode as soon as you receive it from the outside (like a file system) and converting it back if needed before sending it back. There should be only one place where you convert str to unicode, so a helper function like the one I described should not be needed.

2017年12月13日T05:34:11.703Z+00:00

CollectivesTM on Stack Overflow

Python decoding Unicode is not supported

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related