Issue 22701: Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66890

classification

Title:	Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"
Type:	enhancement	Stage:
Components:	Extension Modules, Unicode	Versions:	Python 3.3, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	works for me
Assigned To:	Nosy List:	Michael.Kuss, ezio.melotti, r.david.murray, serhiy.storchaka, vstinner
Priority:	normal	Keywords:

Created on 2014年10月22日 18:55 by Michael.Kuss, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (6)
msg229830 - (view)	Author: Michael Kuss (Michael.Kuss)	Date: 2014年10月22日 18:55
When running the following: >> json.dump(['name': "港区"], myfile.json, indent=4, separators=(',', ': '), ensure_ascii=False) the function escapes the unicode, even though I have explicitly asked to not force to ascii: \u6E2F\u533A By changing "__init__.py" such that the fp.write call encodes the text as utf-8, the output json file displays the human-readable text required (see below). OLD (starting line 167): if (not skipkeys and ensure_ascii and check_circular and allow_nan and cls is None and indent is None and separators is None and encoding == 'utf-8' and default is None and not kw): iterable = _default_encoder.iterencode(obj) else: if cls is None: cls = JSONEncoder iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii, check_circular=check_circular, allow_nan=allow_nan, indent=indent, separators=separators, encoding=encoding, default=default, kw).iterencode(obj) for chunk in iterable: fp.write(chunk) NEW: if (not skipkeys and ensure_ascii and check_circular and allow_nan and cls is None and indent is None and separators is None and encoding == 'utf-8' and default is None and not kw): iterable = _default_encoder.iterencode(obj) for chunk in iterable: fp.write(chunk) else: if cls is None: cls = JSONEncoder iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii, check_circular=check_circular, allow_nan=allow_nan, indent=indent, separators=separators, encoding=encoding, default=default, kw).iterencode(obj) for chunk in iterable: fp.write(chunk.encode('utf-8'))
msg229834 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2014年10月22日 19:39
If I fix your example so it runs: json.dump({'name': "港区"}, open('myfile.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False) I get the expected output: rdmurray@pydev:~/python/p34>cat myfile.json { "name": "港区" } That example won't work in python2, of course, so you'd have to show us your actual code there.
msg230365 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2014年10月31日 18:29
The example works for me with both python 2 and 3. I'm going to close this in a while if OP doesn't reply. $ python2 -c "import json; json.dump({'name': '港区'}, open('py2.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py2.json { "name": "港区" } $ python3 -c "import json; json.dump({'name': '港区'}, open('py3.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py3.json { "name": "港区" }
msg230417 - (view)	Author: Michael Kuss (Michael.Kuss)	Date: 2014年11月01日 00:50
Pardon the delay - this json dump function is embedded in a much larger script, so it took some untangling to get it running on Python 3.3, and scrub some personal identifying info from it. This script also does not work in Python 3.3: File "C:/Users/mkuss/PycharmProjects/TestJSON\dump_list_to_json_file.py", line 319, in dump_list_to_json_file json.dump(addresses, outfile, indent=4, separators=(',', ': ')) File "C:\Python33\lib\json\__init__.py", line 184, in dump fp.write(chunk) TypeError: 'str' does not support the buffer interface In python 2.7, I still get escaped unicode when I try writing this dictionary using json.dump, so the work-around that I pasted originally is how I'm choosing to accomplish the task for now. I'd you'd like, I can spend more time debugging this issue I'm running into running the script in python 3.3, but it maybe be til next week when I have sufficient time to solve. THANKS --mike
msg230421 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2014年11月01日 01:47
That error message indicates you've opened the output file in binary mode instead of text mode.
msg231994 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年12月02日 13:19
Looks either you have opened a file with the backslashreplace error handler or ran Python with PYTHONIOENCODING which sets the backslashreplace error handler.

History
Date	User	Action	Args
2022年04月11日 14:58:09	admin	set	github: 66890
2015年02月10日 08:43:39	serhiy.storchaka	set	status: pending -> closed
2014年12月02日 13:19:23	serhiy.storchaka	set	status: open -> pending nosy: + serhiy.storchaka messages: + msg231994
2014年11月01日 01:47:03	r.david.murray	set	messages: + msg230421
2014年11月01日 00:50:19	Michael.Kuss	set	status: pending -> open messages: + msg230417
2014年10月31日 18:29:07	ezio.melotti	set	status: open -> pending resolution: works for me messages: + msg230365
2014年10月22日 19:39:34	r.david.murray	set	nosy: + r.david.murray messages: + msg229834
2014年10月22日 18:55:23	Michael.Kuss	create

homepage