I'm trying to encode non-ascii characters in python using utf-16-le, and here's the snippet of the code for this:
import os
import sys
def run():
print sys.getdefaultencoding()
reload(sys)
sys.setdefaultencoding('utf-16-le')
print sys.getdefaultencoding()
test_dir = unit_test_utils.get_test_dir("utkarsh")
dir_name_1 = '東京'
....
....
if __name__ == '__main__':
run()
When this code is run, this is the error seen:
# /u/bin/python-qs /root/python/tests/abc.py -c /root/test.conf
File "/root/python/tests/abc.py", line 27
SyntaxError: Non-ASCII character '\xe6' in file /root/python/tests/abc.py on line 27, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
How can this be fixed? I tried adding this line to the beginning of the file, but to no avail:
# -*- coding: utf-16-le -*-
The error this time around was:
# /u/bin/python-qs /root/python/tests/abc.py -c /root/test.conf
File "/root/python/tests/abc.py", line 2
import os
import sys
...
...
if __name__ == '__main__':
run()
^
SyntaxError: invalid syntax
Edit:
Line 27: dir_name_1 = '東京'
1 Answer 1
All is (almost) fine in the code you show. You have a source file encoded in utf-8 (as stated by your comment on the result of the file command), so the line
dir_name_1 = '東京'
is in fact (as you are using a Python 2.x):
dir_name_1 = '\xe6\x9d\xb1\xe4\xba\xac' # utf8 for 東京
The only problem is that on line 27 (that you failed to show) you are doing something with that utf8 encoded string, probably trying to convert it (explicitely or implicitely) to unicode without specifying any encoding, so ascii is taken as default and error is then normal since \xe6 in not in ascii range. You should explicitely decode the string with dir_name_1.decode('utf8')
10 Comments
dir_name_1 = '東京'. I've updated the post with this.dir_name_1 = '東京'; utf16_dirname_1 = dir_name_1.decode('utf8').encode(utf16-le)# -*- coding: utf8-*- or if first line declares a shell (such as #! /usr/bin/env python, the coding line must be the second one.
file abc.py.sys.setdefaultencoding(). You are trying to auto-set broken bones there rather than not break your bones in the first place. Read nedbatchelder.com/text/unipain.html instead and handle Unicode properly.\xhhor\uhhhhescape sequences in those literals instead. A source code encoding declaration won't help with encoding and decoding data in your program.