Issue 12134: json.dump much slower than dumps

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/56343

classification

Title:	json.dump much slower than dumps
Type:	performance	Stage:	needs patch
Components:	Documentation, Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	rejected
Assigned To:	rhettinger	Nosy List:	docs@python, eric.araujo, ezio.melotti, pitrou, poq, rhettinger, terry.reedy
Priority:	normal	Keywords:	easy

Created on 2011年05月21日 13:38 by poq, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (12)
msg136439 - (view)	Author: (poq)	Date: 2011年05月21日 13:38
import json, timeit obj = [[1,2,3]10]10 class writable(object): def write(self, buf): pass w = writable() print('dumps: %.3f' % timeit.timeit(lambda: json.dumps(obj), number=10000)) print('dump: %.3f' % timeit.timeit(lambda: json.dump(obj,w), number=10000)) On my machine this outputs: dumps: 0.391 dump: 4.501 I believe this is mostly caused by dump using JSONEncoder.iterencode without _one_shot=True, resulting in c_make_encoder not being used.
msg136440 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2011年05月21日 13:57
This is indeed the case. And the solution is obvious: call dumps() and then write() the result yourself. If dump() used _one_shot=True, it would defeat the purpose of minimizing memory consumption by not buffering the whole result.
msg136641 - (view)	Author: Éric Araujo (eric.araujo) * (Python committer)	Date: 2011年05月23日 14:15
I believe Antoine is saying "works as intended", i.e. not a bug. I’m not sure it’s worth a doc change.
msg136691 - (view)	Author: (poq)	Date: 2011年05月23日 19:51
Alright. I wouldn't mind a little note in the docs; I certainly did not expect that these two functions would perform so differently. Would it be very difficult though to add buffering support the C encoder?
msg137146 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2011年05月28日 19:42
From just reading the docs, it appears that json.dump(obj,fp) == fp.write(json.dumps(obj)) and it is easy to wonder why .dump even exists, as it seems a trivial abbreviation (and why not .dump and .dumpf instead). Since, '_one_shot' and 'c_make_encoder' are not mentioned in the doc, there is no hint from these either. So I think a doc addition is needed. The benchmark is not completely fair as the .dumps timing omits the write call. For the benchmark, that would be trivial. But in real use on multitasking systems with slow (compared to cpu speed) output channels, the write time might dominate. I can even imagine .dump sometimes winning by getting chunks into a socket buffer and possibly out on the wire, almost immediately, instead of waiting to compute the entire output, possibly interrupted by task swaps. So I presume this is at least part of the reason for the incremental .dump. I changed 'pass' to 'print(bug)' in class writable and verified that .dump is very incremental. Even '[' and ']' are separate outputs. DOC suggestion: (limited to CPython since spec does not prohibit naive implementation of .dump given above) After current .dumps line, add "In CPython, json.dumps(o), by itself, is faster than json.dump(o,f), at the expense of using more space, because it creates the entire string at once, instead of incrementally writing each piece of o to f. However, f.write(json.dumps(o)) may not be faster."
msg137166 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2011年05月29日 02:07
The name dump and dumps exist to match the same API provided by pickle and marshal. I agree that a note marked as CPython implementation detail should be added.
msg137169 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2011年05月29日 07:20
Please don't add notes of this sort to the docs.
msg137170 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2011年05月29日 07:29
Are you against the proposed wording or the note itself? Stating that in CPython json.dump doesn't use the C accelerations is a reasonable thing to do imho.
msg137691 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2011年06月05日 11:29
> "In CPython, json.dumps(o), by itself, is faster than json.dump(o,f), > at the expense of using more space, because it creates the entire > string at once, instead of incrementally writing each piece of o to f. > However, f.write(json.dumps(o)) may not be faster." Uh, talking about "CPython" is not very helpful here and only muddies the waters IMO. Something like "typical implementations of dump() will try to write the result in small chunks and will therefore trade lower memory usage for higher serialization time. If you have enough memory and care about performance, consider using dumps() and write the result yourself with a single write() call".
msg137704 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2011年06月05日 17:30
With 'will try to ' and the next 'will ' omitted, I agree that Antoine's version is better than mine.
msg137707 - (view)	Author: (poq)	Date: 2011年06月05日 18:38
dump() is not slower because it's incremental though. It's slower because it's pure Python. I don't think there is necessarily a memory/speed trade-off; it should be possible to write an incremental encoder in C as well.
msg139182 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2011年06月26日 14:41
As Antoine and Eric stated, the module is working as intended and we don't document implementation details and generally stay away from talking about performance in the docs.

History
Date	User	Action	Args
2022年04月11日 14:57:17	admin	set	github: 56343
2011年06月26日 14:41:50	rhettinger	set	status: open -> closed resolution: rejected messages: + msg139182
2011年06月05日 18:38:40	poq	set	messages: + msg137707
2011年06月05日 17:30:34	terry.reedy	set	messages: + msg137704
2011年06月05日 11:29:20	pitrou	set	messages: + msg137691
2011年05月29日 07:29:51	ezio.melotti	set	messages: + msg137170
2011年05月29日 07:20:05	rhettinger	set	assignee: docs@python -> rhettinger messages: + msg137169
2011年05月29日 02:07:33	ezio.melotti	set	keywords: + easy, - patch stage: needs patch messages: + msg137166 versions: + Python 2.7, Python 3.2
2011年05月28日 19:42:09	terry.reedy	set	keywords: + patch nosy: + terry.reedy messages: + msg137146
2011年05月23日 19:51:07	poq	set	messages: + msg136691
2011年05月23日 14:15:08	eric.araujo	set	nosy: + eric.araujo messages: + msg136641
2011年05月21日 13:57:26	pitrou	set	nosy: + rhettinger, pitrou, docs@python messages: + msg136440 assignee: docs@python components: + Documentation
2011年05月21日 13:39:19	ezio.melotti	set	nosy: + ezio.melotti components: + Library (Lib), - Extension Modules versions: + Python 3.3, - Python 2.7, Python 3.2
2011年05月21日 13:38:06	poq	create

homepage