0

Hey I am having this major issue with encoding in python. I am not too familiar with python and have been stuck on this bug for weeks. I feel like I've tried every possible thing but I can't seem to get it.

I am reading files in to work with and am getting the following error on some files that have Chinese charaters.

 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
 response = wrapped_callback(request, *callback_args, **callback_kwargs)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc
 prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc
 filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data
 revision, filediff_id, local_site, modified)
 File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis
 temp_file.write(working_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)

My code in this area looks this this:

working_file = get_original_file(filediff, request, encoding_list)
if modified:
 working_file = get_patched_file(working_file, filediff, request)
working_file = convert_to_unicode(working_file, encoding_list)[1]
logging.debug("Encoding List: %s", encoding_list)
logging.debug("Source File: " + filediff.source_file)
temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_")
logging.debug("temp_file_name: " + temp_file_name)
source_file = os.path.join(HOMEFOLDER, temp_file_name)
logging.debug("File contents" + working_file)
#temp_file = codecs.open(source_file, encoding='utf-8')
#temp_file.write(working_file.encode('utf-8'))
temp_file = open(source_file, 'w')
temp_file.write(working_file)
temp_file.close()

Notice the commented out lines. Working file is never empty. The encoding from the logged "encoding list" is

Encoding List: [u'iso-8859-15']

Anything to help would be soooo appreciated. I have to take a break from this after 8 straight hours of debugging this + the previous two weeks.

asked Dec 8, 2015 at 2:08

2 Answers 2

1

The error indicates working_file is a Unicode string, but is being written to a file that was opened to expect a byte string. Python 2 uses the default ascii codec to implicitly convert the Unicode string to a byte string, and non-ASCII characters trigger the UnicodeEncodeError.

The commented lines are close to correct, but the write will expect Unicode strings with codecs.open, so no need to explicitly encode, and the file needs to be opened for writing:

temp_file = codecs.open(source_file, 'w', encoding='utf-8')
temp_file.write(working_file)
answered Dec 8, 2015 at 3:07
Sign up to request clarification or add additional context in comments.

1 Comment

Shouldn't we be suggesting io.open for its proper newline support?
0

What is the return type of your convert_to_unicode function?

If it is bytes, you probably should change temp_file = open(source_file, 'w') to temp_file = open(source_file, 'wb'), which means writing bytes into file.

answered Dec 8, 2015 at 2:41

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.