0

I am trying to write some Data to a file. In some instances, obviously depending on the Data I am trying to write, I get a UnicodeEncodeError (UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f622' in position 141: character maps to ) I did some research and found out that I can encode the data I am writing with the encode function.

This is the code prior to modifying it (not supporting Unicode):

 scriptDir = os.path.dirname(__file__)
 path = os.path.join(scriptDir, filename)
 with open(path, 'w') as fp:
 for sentence in iobTriplets:
 fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]) for triplet in sentence))
 fp.write("\n")
 fp.write("\n")

So I though maybe I could just add the encoding when writing like that:

fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]).encode('utf8') for triplet in sentence))

But that doesn't work as I am getting the following error: TypeError: sequence item 0: expected str instance, bytes found

I also tried opening the file in byte mode with adding a b behind the w. However that didn't yield any results.

Does anybody know how to fix this? Btw: I am using python 3.

asked Mar 27, 2019 at 19:40
6
  • Please post the original error. Also, I assume this is Python 3? Commented Mar 27, 2019 at 19:42
  • the mode should be encode('utf-8') Commented Mar 27, 2019 at 19:43
  • Probably, yours system has a default encoding that is being picked up by open, maybe something like ASCII. So, try using open(path, 'w', encoding='utf-8') Commented Mar 27, 2019 at 19:44
  • @C.Nivs either works there. In fact, if it's Python 3, then you can just do encode(). I am curious as to why not opening the file in binary mode isn't working. What does "I also tried opening the file in byte mode with adding a b behind the w. However that didn't yield any results." mean exactly Commented Mar 27, 2019 at 19:45
  • 1
    Anyway, you need to use b'\n'.join(...) if you are going to be joining bytes. that is likely the source of your error, but then you will have to use binary mode when opening the file Commented Mar 27, 2019 at 19:46

1 Answer 1

1

You have already opened the file with automatic encoding. There is no need to manually encode anything unless you are writing to binary.
You can specify any supported encoding in open():

 with open(path, 'w', encoding='utf-16be') as fp:

Unless the file is opened as binary, you need to remove the str.encode() in the fp.write():

fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]) for triplet in sentence))
answered Mar 27, 2019 at 19:49
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.