This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年01月21日 19:01 by hhas, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| json.diff | hhas, 2011年01月21日 19:01 | |||
| Messages (28) | |||
|---|---|---|---|
| msg126772 - (view) | Author: (hhas) | Date: 2011年01月21日 19:01 | |
json.loads() accepts strings but errors on bytes objects. Documentation and API indicate that both should work. Review of json/__init__.py code shows that the loads() function's 'encoding' arg is ignored and no decoding takes place before the object is passed to JSONDecoder.decode()
Tested on Python 3.1.2 and Python 3.2rc1; fails on both.
Example:
#################################################
#!/usr/local/bin/python3.2
import json
print(json.loads('123'))
# 123
print(json.loads(b'123'))
# /Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/json/decoder.py:325:
# TypeError: can't use a string pattern on a bytes-like object
print(json.loads(b'123', encoding='utf-8'))
# /Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/json/decoder.py:325:
# TypeError: can't use a string pattern on a bytes-like object
#################################################
Patch attached.
|
|||
| msg126782 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月21日 20:35 | |
Hmm. According to issue 4136, all bytes support was supposed to have been removed. |
|||
| msg126785 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月21日 20:46 | |
Indeed, the documentation (and function docstring) needs fixing instead. It's a pity we didn't remove the useless `encoding` parameter. |
|||
| msg126786 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年01月21日 20:54 | |
Georg: Is it still time to deprecate the encoding parameter in 3.2? |
|||
| msg126788 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月21日 21:38 | |
I've committed a doc fix in r88137. |
|||
| msg126831 - (view) | Author: (hhas) | Date: 2011年01月22日 12:28 | |
Doc fix works for me. |
|||
| msg126986 - (view) | Author: Anthony Long (antlong) | Date: 2011年01月25日 03:38 | |
Works for me, py2.7 on snow leopard. |
|||
| msg126997 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月25日 11:42 | |
anthony: this is python3-only problem. |
|||
| msg133645 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年04月13日 07:23 | |
Now it's too late for 3.2, should this be done for 3.3? |
|||
| msg133672 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年04月13日 15:40 | |
If you’re talking about deprecating the obsolete encoding argument (maybe it’s time for a new bug report), +1. |
|||
| msg145343 - (view) | Author: Barry A. Warsaw (barry) * (Python committer) | Date: 2011年10月11日 13:44 | |
I'll just mention that the elimination of bytes handling is a bit unfortunate, since this idiom which works in Python 2 no longer works: fp = urlopen(url) json_data = json.load(fp) /me sad |
|||
| msg145345 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年10月11日 13:51 | |
> I'll just mention that the elimination of bytes handling is a bit > unfortunate, since this idiom which works in Python 2 no longer works: > > fp = urlopen(url) > json_data = json.load(fp) What if the returned JSON uses a charset other than utf-8 ? |
|||
| msg159359 - (view) | Author: Balthazar Rouberol (Balthazar.Rouberol) | Date: 2012年04月26日 08:20 | |
I know this does not fix anything at the core, but it would allow you to use json.loads() with python 3.2 (maybe 3.1?):
Replace
json.loads(raw_data)
by
raw_data = raw_data.decode('utf-8') # Or any other ISO format
json.loads(raw_data)
|
|||
| msg159360 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月26日 08:34 | |
> What if the returned JSON uses a charset other than utf-8 ? According to RFC 4627: "JSON text SHALL be encoded in Unicode. The default encoding is UTF-8." RFC 4627 also offers a way to autodetect other Unicode encodings. |
|||
| msg159364 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月26日 13:03 | |
Well, adding support for bytes objects using the spec from RFC 4627 (or at least with utf-8 as a default) may be an enhancement for 3.3. |
|||
| msg159366 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月26日 14:07 | |
Things are a little more complicated. '123' is not a valid JSON according to RFC 4627 (the top-level element can only be an object or an array). This means that the autodetection algorithm will not always work for such non-standard data. If we can parse binary data, then there must be a way to generate binary data in at least one of the Unicode encodings. By the way, the documentation should give a link to RFC 4627 and explain the current implementation is different from it. |
|||
| msg159368 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月26日 14:21 | |
> Things are a little more complicated. '123' is not a valid JSON > according to RFC 4627 (the top-level element can only be an object or > an array). This means that the autodetection algorithm will not always > work for such non-standard data. The autodetection algorithm needn't examine all 4 first bytes. If the 2 first bytes are non-zero, you have UTF-8 data. Otherwise, the JSON text will be at least 4 bytes long (since it's either UTF-16 or UTF-32). |
|||
| msg159388 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月26日 15:48 | |
I mean a string that starts with '\u0000'. b'"\x00...'. |
|||
| msg159391 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月26日 16:12 | |
Le jeudi 26 avril 2012 à 15:48 +0000, Serhiy Storchaka a écrit :
>
> I mean a string that starts with '\u0000'. b'"\x00...'.
According to the RFC, that should be escaped:
All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
And indeed:
>>> json.loads('"\u0000"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/antoine/opt/lib/python3.2/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/home/antoine/opt/lib/python3.2/json/decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/antoine/opt/lib/python3.2/json/decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 1 (char 1)
>>> json.loads('"\\u0000"')
'\x00'
|
|||
| msg159395 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月26日 16:21 | |
According to current implementation this is acceptable.
>>> json.loads('"\u0000"', strict=False)
'\x00'
|
|||
| msg159454 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月27日 14:06 | |
> According to current implementation this is acceptable. Then perhaps auto-detection can be restricted to strict mode? Non-strict mode would always use utf-8. Or we can just skip auto-detection altogether (I don't think many people produce utf-16 or utf-32 JSON; that would be a waste of bandwidth for no obvious benefit). |
|||
| msg159469 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月27日 15:28 | |
Related to this question is a question about errors. How to inform the user, if an error occurred in the decoding with detected encoding? Leave UnicodeDecodeError or convert it to ValueError? If there is a syntax error in JSON -- exception will refer to the position in the decoded string, we should to translate it to the position in the original binary string? |
|||
| msg204810 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年11月30日 14:06 | |
Issue 19837 is the complementary problem on the serialisation side - users migrating from Python 2 are accustomed to being able to use the json module directly as a wire protocol module, but the strict Python 3 interpretation as a text transform means that isn't possible - you have to apply the text encoding step separately. What appears to have happened is that the way JSON is used in practice has diverged from JSON as a formal spec. Formal spec (this is what the Py3k JSON module implements, and Py2 implements with ensure_ascii=False): JSON is a Unicode text transform, which may optionally be serialised as UTF-8, UTF-16 or UTF-32. Practice (what the Py2 JSON module implements with ensure_ascii=True, and what is covered in RFC 4627): JSON is a UTF-8 encoded wire protocol So now we're left with the options: - try to tweak the existing json APIs to handle both the str<->str and str<->bytes use cases (ugly) - add new APIs within the existing json module - add a new "jsonb" module, which dumps to UTF-8 encoded bytes, and reads from UTF-8, UTF-16 or UTF-32 encoded bytes in accordance with RFC 4627 (but being more tolerant in terms of what is allowed at the top level) I'm currently leaning towards the "jsonb" module option, and deprecating the "encoding" argument in the pure text version. It's not pretty, but I think it's better than the alternatives. |
|||
| msg204937 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年12月01日 15:39 | |
Bike-shedding: instead of jsonb, make it json.bytes. Else, it may get confused with other protocols, such as "JSONP" or "BSON". |
|||
| msg204959 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年12月01日 20:57 | |
json.bytes would also work for me. It wouldn't need to replicate the full main module API, just combine the text transform with UTF-8 encoding and decoding (as well as autodetected UTF-16 and UTF-32 decoding) for the main 4 functions (dump[s], load[s]). If people want UTF-16 and UTF-32 *en*coding (which seem to be rarely used in combination with JSON), then they can invoke the text transform version directly, and then do a separate encoding step. |
|||
| msg215529 - (view) | Author: Hanxue Lee (Hanxue.Lee) | Date: 2014年04月04日 15:23 | |
This seems to be an issue (bug?) for Python 3.3 When calling json.loads() with a byte array, this is the error json.loads(response.data, 'latin-1') TypeError: can't use a string pattern on a bytes-like object When I decode the byte array to string json.loads(response.data.decode(), 'latin-1') I get this error TypeError: bytes or integer address expected instead of str instance |
|||
| msg229973 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年10月25日 01:10 | |
Issue 17909 (auto-detecting JSON encoding) looks like it has a patch which would probably satisfy this issue |
|||
| msg275615 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2016年09月10日 10:21 | |
As Martin noted, Serhiy has implemented the autodetection option for json.loads in #17909 so closing this one as out of date - UTF-8, UTF-16 and UTF-32 encoded JSON data will be deserialised automatically in 3.6, while other text encodings aren't officially supported by the JSON RFCs. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:11 | admin | set | github: 55185 |
| 2016年09月10日 10:21:47 | ncoghlan | set | status: open -> closed superseder: Autodetecting JSON encoding resolution: out of date messages: + msg275615 |
| 2016年08月17日 12:28:10 | pitrou | set | nosy:
- pitrou |
| 2016年08月17日 12:20:36 | vstinner | link | issue27765 superseder |
| 2016年08月17日 12:20:19 | vstinner | set | versions: + Python 3.6, - Python 3.5 |
| 2016年08月17日 12:20:15 | vstinner | set | title: json.loads() raises TypeError on bytes object -> accept bytes in json.loads() |
| 2014年10月25日 01:10:47 | martin.panter | set | nosy:
+ martin.panter messages: + msg229973 |
| 2014年04月04日 15:23:44 | Hanxue.Lee | set | nosy:
+ Hanxue.Lee messages: + msg215529 |
| 2014年03月29日 01:45:18 | cvrebert | set | nosy:
+ cvrebert |
| 2013年12月01日 20:57:46 | ncoghlan | set | messages: + msg204959 |
| 2013年12月01日 15:39:43 | loewis | set | nosy:
+ loewis messages: + msg204937 |
| 2013年11月30日 14:06:01 | ncoghlan | set | messages:
+ msg204810 versions: + Python 3.5, - Python 3.3 |
| 2013年11月30日 11:07:07 | pitrou | set | nosy:
+ ncoghlan |
| 2013年10月18日 07:08:23 | kousu | set | nosy:
+ kousu |
| 2013年09月20日 15:28:08 | jleedev | set | nosy:
+ jleedev |
| 2012年04月27日 15:28:06 | serhiy.storchaka | set | messages: + msg159469 |
| 2012年04月27日 14:06:12 | pitrou | set | messages: + msg159454 |
| 2012年04月26日 16:21:34 | serhiy.storchaka | set | messages: + msg159395 |
| 2012年04月26日 16:12:44 | pitrou | set | messages: + msg159391 |
| 2012年04月26日 15:48:23 | serhiy.storchaka | set | messages: + msg159388 |
| 2012年04月26日 15:09:07 | eric.araujo | set | title: json.loads() throws TypeError on bytes object -> json.loads() raises TypeError on bytes object |
| 2012年04月26日 14:21:40 | pitrou | set | messages: + msg159368 |
| 2012年04月26日 14:07:45 | serhiy.storchaka | set | messages: + msg159366 |
| 2012年04月26日 13:03:55 | pitrou | set | versions:
+ Python 3.3, - Python 3.2 messages: + msg159364 assignee: docs@python -> components: + Library (Lib), - Documentation type: behavior -> enhancement stage: needs patch |
| 2012年04月26日 08:34:31 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg159360 |
| 2012年04月26日 08:20:56 | Balthazar.Rouberol | set | nosy:
+ Balthazar.Rouberol messages: + msg159359 |
| 2011年10月11日 13:51:37 | pitrou | set | messages: + msg145345 |
| 2011年10月11日 13:44:47 | barry | set | nosy:
+ barry messages: + msg145343 |
| 2011年04月13日 15:40:46 | eric.araujo | set | messages:
+ msg133672 versions: - Python 3.1 |
| 2011年04月13日 07:23:28 | ezio.melotti | set | nosy:
+ ezio.melotti messages: + msg133645 |
| 2011年01月25日 11:42:30 | r.david.murray | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python, antlong messages: + msg126997 |
| 2011年01月25日 03:38:49 | antlong | set | nosy:
+ antlong messages: + msg126986 |
| 2011年01月22日 12:28:33 | hhas | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python messages: + msg126831 |
| 2011年01月21日 21:38:06 | pitrou | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python messages: + msg126788 |
| 2011年01月21日 20:54:35 | eric.araujo | set | nosy:
+ eric.araujo, georg.brandl messages: + msg126786 |
| 2011年01月21日 20:46:48 | pitrou | set | nosy:
+ docs@python messages: + msg126785 assignee: docs@python components: + Documentation, - Library (Lib) |
| 2011年01月21日 20:35:32 | r.david.murray | set | nosy:
+ r.david.murray, pitrou messages: + msg126782 |
| 2011年01月21日 19:01:47 | hhas | create | |