Python decode and encode with utf-8

Question 1

I am trying to encode and decode with utf-8. What is wierd is that I get an error trackback saying that I am using gbk.

oneword.decode("utf-8")]

below is the error trackback.

UnicodeEncodeError: 'gbk' codec can't encode character '\u2769' in position 1: illegal multibyte sequence

Can anyone tell me what to do? I seems that the decode parameter does not have effect.

Question 2

What is oneword? Please update your post with the result of print(oneword).

Question 3

Actually repr(oneword) might be more useful. The UnidoceEncodeError makes it look like its trying to first encode oneword before decoding it, as if oneword is a bytes object. I haven't seen this behaviour before in Python 3 though.

Question 4

I got it solved. Actually, I intended to output to a file instead of the console. In such situation, I have to explicitly indicate the decoding of the output target file. Instead of using open I used codecs.open.

import codecs
f = codecs.open(filename, mode='w', encoding='utf-8')

Thanks to @Bakuriu from the comments:

If you are using Python 3 you no longer need to import the codecs module. Just pass the encoding parameter to the built-in open function.

Question 5

In python3 there is no need to import the codecs module. Just pass the encoding parameter to the built-in open function. You can achieve the same behaviour in python2 using io.open.

Question 6

Adding to @Bakuriu: The codecs module actually has several annoying bugs that will likely never be fixed, and performs worse than the io module components (open is io.open on Python 3); you basically never want to use codecs.open unless your code needs to run unmodified all the way back to Python 2.5 (as of 2.6, io.open exists).

flexwang 6731 gold badge6 silver badges18 bronze badges · Accepted Answer · 2014-01-11 08:28:59Z

1

I got it solved. Actually, I intended to output to a file instead of the console. In such situation, I have to explicitly indicate the decoding of the output target file. Instead of using open I used codecs.open.

import codecs
f = codecs.open(filename, mode='w', encoding='utf-8')

Thanks to @Bakuriu from the comments:

If you are using Python 3 you no longer need to import the codecs module. Just pass the encoding parameter to the built-in open function.

Share

Improve this answer

edited May 11, 2022 at 1:50

Henry Ecker's user avatar

Henry Ecker ♦

35.8k19 gold badges48 silver badges67 bronze badges

answered Jan 11, 2014 at 8:28

flexwang's user avatar

flexwang

6731 gold badge6 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bakuriu

Bakuriu Over a year ago

In python3 there is no need to import the codecs module. Just pass the encoding parameter to the built-in open function. You can achieve the same behaviour in python2 using io.open.

2014年01月11日T10:28:34.623Z+00:00

ShadowRanger

ShadowRanger Over a year ago

Adding to @Bakuriu: The codecs module actually has several annoying bugs that will likely never be fixed, and performs worse than the io module components (open is io.open on Python 3); you basically never want to use codecs.open unless your code needs to run unmodified all the way back to Python 2.5 (as of 2.6, io.open exists).

2022年05月11日T01:33:31.837Z+00:00

CollectivesTM on Stack Overflow

Python decode and encode with utf-8

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related