Python encoding issue, can't seem to figure it out

Question 1

Hey I am having this major issue with encoding in python. I am not too familiar with python and have been stuck on this bug for weeks. I feel like I've tried every possible thing but I can't seem to get it.

I am reading files in to work with and am getting the following error on some files that have Chinese charaters.

 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
 response = wrapped_callback(request, *callback_args, **callback_kwargs)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc
 prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc
 filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data
 revision, filediff_id, local_site, modified)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis
 temp_file.write(working_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)

My code in this area looks this this:

working_file = get_original_file(filediff, request, encoding_list)
if modified:
 working_file = get_patched_file(working_file, filediff, request)
working_file = convert_to_unicode(working_file, encoding_list)[1]
logging.debug("Encoding List: %s", encoding_list)
logging.debug("Source File: " + filediff.source_file)
temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_")
logging.debug("temp_file_name: " + temp_file_name)
source_file = os.path.join(HOMEFOLDER, temp_file_name)
logging.debug("File contents" + working_file)
#temp_file = codecs.open(source_file, encoding='utf-8')
#temp_file.write(working_file.encode('utf-8'))
temp_file = open(source_file, 'w')
temp_file.write(working_file)
temp_file.close()

Notice the commented out lines. Working file is never empty. The encoding from the logged "encoding list" is

Encoding List: [u'iso-8859-15']

Anything to help would be soooo appreciated. I have to take a break from this after 8 straight hours of debugging this + the previous two weeks.

Question 2

The error indicates working_file is a Unicode string, but is being written to a file that was opened to expect a byte string. Python 2 uses the default ascii codec to implicitly convert the Unicode string to a byte string, and non-ASCII characters trigger the UnicodeEncodeError.

The commented lines are close to correct, but the write will expect Unicode strings with codecs.open, so no need to explicitly encode, and the file needs to be opened for writing:

temp_file = codecs.open(source_file, 'w', encoding='utf-8')
temp_file.write(working_file)

Question 3

Shouldn't we be suggesting io.open for its proper newline support?

Question 4

What is the return type of your convert_to_unicode function?

If it is bytes, you probably should change temp_file = open(source_file, 'w') to temp_file = open(source_file, 'wb'), which means writing bytes into file.

Mark Tolonen 181k26 gold badges184 silver badges279 bronze badges · Accepted Answer · 2015-12-08 03:07:47Z

The error indicates working_file is a Unicode string, but is being written to a file that was opened to expect a byte string. Python 2 uses the default ascii codec to implicitly convert the Unicode string to a byte string, and non-ASCII characters trigger the UnicodeEncodeError.

The commented lines are close to correct, but the write will expect Unicode strings with codecs.open, so no need to explicitly encode, and the file needs to be opened for writing:

temp_file = codecs.open(source_file, 'w', encoding='utf-8')
temp_file.write(working_file)

Shouldn't we be suggesting io.open for its proper newline support?

CollectivesTM on Stack Overflow

Python encoding issue, can't seem to figure it out

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related