3

I'm trying to read some files using Python3.2, the some of the files may contain unicode while others do not.

When I try:

file = open(item_path + item, encoding="utf-8")
for line in file:
 print (repr(line))

I get the error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 13-16: ordinal not in range(128)

I am following the documentation here: http://docs.python.org/release/3.0.1/howto/unicode.html

Why would Python be trying to encode to ascii at any point in this code?

asked Apr 25, 2012 at 9:28
2
  • 2
    To be clear: when you write Unicode here, you mean UTF-8? Also, it sounds like all the files are UTF-8, but some might only contain the subset that is also ASCII. Commented Apr 25, 2012 at 9:30
  • stackoverflow.com/a/983752/680372 Commented Sep 13, 2015 at 17:32

2 Answers 2

3

The problem is that repr(line) in Python 3 returns also the Unicode string. It does not convert the above 128 characters to the ASCII escape sequences.

Use ascii(line) instead if you want to see the escape sequences.

Actually, the repr(line) is expected to return the string that if placed in a source code would produce the object with the same value. This way, the Python 3 behaviour is just fine as there is no need for ASCII escape sequences in the source files to express a string with more than ASCII characters. It is quite natural to use UTF-8 or some other Unicode encoding these day. The truth is that Python 2 produced the escape sequences for such characters.

answered Apr 25, 2012 at 11:49
Sign up to request clarification or add additional context in comments.

Comments

2

What's your output encoding? If you remove the call to print(), does it start working?

I suspect you've got a non-UTF-8 locale, so Python is trying to encode repr(line) as ASCII as part of printing it.

To resolve the issue, you must either encode the string and print the byte array, or set your default encoding to something that can handle your strings (UTF-8 being the obvious choice).

answered Apr 25, 2012 at 9:32

1 Comment

Yes, you are correct. It is only printing which causing the issue, when I pass the lines to something else(Eg, QListView) they appear fine.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.