homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: JSONDecoder.raw_decode breaks on leading whitespace
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Bayard Randel, Mariatta, aalien, danielnoord, ezio.melotti, jcea, r.david.murray, rbcollins
Priority: normal Keywords: easy, patch

Created on 2012年07月19日 10:09 by aalien, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
json-raw-decoder-fix-whitespace.diff aalien, 2012年07月19日 11:40 Patch for Python 3.3.0 review
json-raw-decoder-fix-whitespace.diff aalien, 2012年07月20日 12:41 Patch for Python 3.3.0, take two review
issue15393_3.6.patch Bayard Randel, 2016年09月13日 00:28 review
Messages (14)
msg165829 - (view) Author: Antti Laine (aalien) Date: 2012年07月19日 10:09
raw_decode on json.JSONDecoder does not handle leading whitespace. According to RFC 4627, section 2, whitespace can precede an object. With json.loads leading whitespace is handled just fine.
d = json.JSONDecoder()
d.raw_decode(' {}')
ValueError: No JSON object could be decoded
Tested with versions 2.6.7, 2.7.3, 3.1.3 and 3.2.3
msg165830 - (view) Author: Antti Laine (aalien) Date: 2012年07月19日 10:16
My coworker just submitted a pull request for a possible fix to simplejson on github.
https://github.com/simplejson/simplejson/pull/38 
msg165831 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年07月19日 10:37
Please, post a patch for 2.7 and 3.2/3.3, with a test. Seems quite easy.
If you hurry, this could go in 3.3.0.
msg165832 - (view) Author: Antti Laine (aalien) Date: 2012年07月19日 11:40
Suggestion for a patch for 3.3.0. I wasn't quite sure how the testcases were supposed to be loaded. Sorry if I made a mess ;)
msg165834 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年07月19日 11:54
Antti, you are changing the signature of "decode()" and that would be a compatibility problem. Can you rewrite the patch to be more "compatible"?
In your test, please, use a bit more complicated object than "{}" :-)
msg165835 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年07月19日 11:56
IMO this is not a bug, according to the current documentation it is working as designed. raw_decode says it decodes a string that *starts with* a json document.
To my understanding, raw_decode is designed to be used when parsing a stream containing more than just one json document, including possibly non-json text, in which case it makes sense that the calling application would be responsible for (and want to) handle the whitespace around any such document.
I recommend closing this as invalid. If the consensus is otherwise and a change is made, I think it would be an enhancement, not a bug fix.
msg165836 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年07月19日 12:03
Ah, I see, you are thinking that "json document" includes the possibility of leading whitespace. That is a reasonable interpretation...just make sure that we don't break backward compatibility.
msg165918 - (view) Author: Antti Laine (aalien) Date: 2012年07月20日 12:41
> you are changing the signature of "decode()" and that would be a compatibility problem
I was changing the signature of raw_decode(), by adding a(n almost) private keyword. I really don't see how that would affect compatibility.
> you are thinking that "json document" includes the possibility of leading whitespace
Yes. While JSON does not have a real standard to follow, I believe that going by the RFC is the best thing to do, and the RFC very clearly states, that objects can be surrounded with whitespace.
decode() depends on the functionality of raw_decode(), and its own parameter _w. I have no idea why _w is a parameter. It is not used anywhere in CPython, and, beginning with an underscore, it shouldn't be used from anywhere else. Still it offers the users a possibility to break the behaviour of decode(). If it's a performance hack, making the variable being initialized only once, then I think it's a very poor one.
Now the patch leaves decode() as it is, and checks for whitespace in raw_decode(), which leads to checking for whitespace twice when using decode().
I think that a more extensive cleanup would be in order.
msg168936 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012年08月23日 12:03
I think skipping preceding whitespace in raw_decode() wouldn't hurt, but skipping following whitespace would be wrong.
David: Any examples of how this could break backwards compatibility?
msg168939 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年08月23日 12:56
I didn't have anything specific in mind, just making a general comment about the care that needs to be taken in crafting the fix.
msg276143 - (view) Author: Bayard Randel (Bayard Randel) * Date: 2016年09月13日 00:28
I've provided an updated patch for 3.6. Additionally I've updated the docstring for decoder.raw_decode to explain that whitespace at the beginning of the document will be ignored. I believe the concerns around the backwards incompatible change to the function signature were addressed by aalien's second patch.
msg276161 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2016年09月13日 04:16
I think the patch should either be rejected, or also handle trailing spaces: if we're taking the RFC definition of whitespace not being structural then we should also eat trailing space, which will change the check for extra data in decode to just checking end == s.len().
Obviously a test for the trailing case needs to be added as well.
msg276291 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016年09月13日 15:24
Robert: as noted, skipping trailing whitespace would be breaking raw_decode's contract. It is designed to parse a json document and return the position of the last character of that document. If there is then another json document in the stream, parsing the remainder of the stream would skip that whitespace. If it is some other data, then the whitespace should be preserved for whatever is parsing the remainder of the stream (if anything).
In any case that behavior cannot be changed because it would break backward compatibility.
I would not object to rejecting the change, but it does make raw_decode slightly more awkward to use if it does not skip the leading whitespace. So I guess I'm +0.5 on accepting it as is.
msg415523 - (view) Author: Daniël van Noord (danielnoord) * Date: 2022年03月18日 20:05
@Bayard Randel, do you want to make this patch into a GitHub PR? If not, I could help by doing so and try to get this landed.
History
Date User Action Args
2022年04月11日 14:57:33adminsetgithub: 59598
2022年03月18日 20:05:49danielnoordsetnosy: + danielnoord
messages: + msg415523
2016年10月04日 10:56:05Mariattasetnosy: + Mariatta
2016年09月13日 15:24:40r.david.murraysetversions: + Python 3.5, Python 3.6, Python 3.7, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2016年09月13日 15:24:08r.david.murraysetmessages: + msg276291
2016年09月13日 07:12:07petri.lehtinensetnosy: - petri.lehtinen
2016年09月13日 04:16:10rbcollinssetnosy: + rbcollins
messages: + msg276161
2016年09月13日 00:28:28Bayard Randelsetfiles: + issue15393_3.6.patch
nosy: + Bayard Randel
messages: + msg276143

2012年08月23日 12:56:01r.david.murraysetmessages: + msg168939
2012年08月23日 12:03:38petri.lehtinensetnosy: + petri.lehtinen
messages: + msg168936
2012年07月20日 12:41:59aaliensetfiles: + json-raw-decoder-fix-whitespace.diff

messages: + msg165918
2012年07月19日 12:03:06r.david.murraysetmessages: + msg165836
2012年07月19日 11:56:17r.david.murraysetnosy: + r.david.murray
messages: + msg165835
2012年07月19日 11:54:19jceasetmessages: + msg165834
2012年07月19日 11:40:55aaliensetfiles: + json-raw-decoder-fix-whitespace.diff
keywords: + patch
messages: + msg165832
2012年07月19日 10:37:45jceasetnosy: + jcea
messages: + msg165831

keywords: + easy
stage: needs patch
2012年07月19日 10:25:21ezio.melottisetnosy: + ezio.melotti

versions: + Python 3.3, Python 3.4, - Python 2.6, Python 3.1
2012年07月19日 10:16:14aaliensetmessages: + msg165830
2012年07月19日 10:09:55aaliencreate

AltStyle によって変換されたページ (->オリジナル) /