I am trying to encode and decode with utf-8. What is wierd is that I get an error trackback saying that I am using gbk.
oneword.decode("utf-8")]
below is the error trackback.
UnicodeEncodeError: 'gbk' codec can't encode character '\u2769' in position 1: illegal multibyte sequence
Can anyone tell me what to do? I seems that the decode parameter does not have effect.
1 Answer 1
I got it solved.
Actually, I intended to output to a file instead of the console. In such situation, I have to explicitly indicate the decoding of the output target file. Instead of using open I used codecs.open.
import codecs
f = codecs.open(filename, mode='w', encoding='utf-8')
Thanks to @Bakuriu from the comments:
If you are using Python 3 you no longer need to import the
codecsmodule. Just pass theencodingparameter to the built-inopenfunction.
2 Comments
codecs module. Just pass the encoding parameter to the built-in open function. You can achieve the same behaviour in python2 using io.open.codecs module actually has several annoying bugs that will likely never be fixed, and performs worse than the io module components (open is io.open on Python 3); you basically never want to use codecs.open unless your code needs to run unmodified all the way back to Python 2.5 (as of 2.6, io.open exists).Explore related questions
See similar questions with these tags.
oneword? Please update your post with the result ofprint(oneword).repr(oneword)might be more useful. TheUnidoceEncodeErrormakes it look like its trying to first encodeonewordbefore decoding it, as ifonewordis abytesobject. I haven't seen this behaviour before in Python 3 though.