Issue 20416: Marshal: performance regression with versions 3 and 4

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64615

classification

Title:	Marshal: performance regression with versions 3 and 4
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.5

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	serhiy.storchaka	Nosy List:	kristjan.jonsson, larry, loewis, pitrou, python-dev, serhiy.storchaka, vajrasky, vstinner
Priority:	normal	Keywords:	patch

Created on 2014年01月28日 10:10 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
marshal3_numbers.patch	vstinner, 2014年01月28日 10:10	review
bench.py	vstinner, 2014年01月28日 10:14
marshal_small_ints_refs.patch	serhiy.storchaka, 2014年02月13日 21:35	review
marshal_refs_by_value.patch	serhiy.storchaka, 2015年01月28日 16:45	review
bench_issue20416.py	serhiy.storchaka, 2015年01月28日 16:46
marshal_refs_by_value_3.patch	serhiy.storchaka, 2015年02月04日 11:16	review
marshal_hashtable.patch	serhiy.storchaka, 2015年02月04日 11:30	review
marshal_hashtable_2.patch	serhiy.storchaka, 2015年02月10日 22:00	review

Messages (25)
msg209524 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 10:10
Attached patched disables references for int and float types.
msg209525 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 10:14
Use attached bench.py to compare performances. Without the patch: --- dumps v0: 389.6 ms data size v0: 45582.9 kB loads v0: 573.3 ms dumps v1: 391.4 ms data size v1: 45582.9 kB loads v1: 558.0 ms dumps v2: 166.9 ms data size v2: 41395.4 kB loads v2: 482.2 ms dumps v3: 431.2 ms data size v3: 41395.4 kB loads v3: 543.8 ms dumps v4: 361.8 ms data size v4: 37000.9 kB loads v4: 560.4 ms --- With the patch: --- dumps v0: 391.4 ms data size v0: 45582.9 kB loads v0: 578.2 ms dumps v1: 392.3 ms data size v1: 45582.9 kB loads v1: 556.8 ms dumps v2: 167.7 ms data size v2: 41395.4 kB loads v2: 484.6 ms dumps v3: 170.3 ms data size v3: 41395.4 kB loads v3: 467.0 ms dumps v4: 122.8 ms data size v4: 37000.9 kB loads v4: 468.9 ms --- dumps v3 is 60% faster, loads v3 is also 14% faster. dumps v4 is 66% faster, loads v4 is 16% faster.
msg209526 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 10:19
Performance of Python 3.3.3+: --- dumps v0: 374.8 ms data size v0: 45582.9 kB loads v0: 625.3 ms dumps v1: 374.6 ms data size v1: 45582.9 kB loads v1: 605.1 ms dumps v2: 152.9 ms data size v2: 41395.4 kB loads v2: 556.5 ms --- So with the patch, the Python 3.4 default version (4) is faster (dump 20% faster, load 16% faster) and produces smaller files (10% smaller).
msg209528 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 10:20
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.
msg209530 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2014年01月28日 10:25
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. http://bugs.python.org/issue20185
msg209531 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 10:36
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. The Derby is suspended until the release of Python 3.4 final. I consider this issue as an important performance regression that should be fixed before Python 3.4 final.
msg209532 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年01月28日 10:54
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
msg209533 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年01月28日 11:34
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. Here are new benchmarks on Python 3.4 with: Integers: [1000] * 1000000 Floats: [1000.0] * 1000000 Integers, without the patch: dumps v3: 62.8 ms data size v3: 4882.8 kB loads v3: 10.7 ms Integers, with the patch: dumps v3: 18.6 ms (-70%) data size v3: 4882.8 kB (same size) loads v3: 27.7 ms (+158%) Floats, without the patch: dumps v3: 62.5 ms data size v3: 4882.8 kB loads v3: 11.0 ms Floats, with the patch: dumps v3: 29.3 ms (-53%) data size v3: 8789.1 kB (+80%) loads v3: 25.5 ms (+132%) The version 3 was added by: --- changeset: 82816:01372117a5b4 user: Kristján Valur Jónsson <sweskman@gmail.com> date: Tue Mar 19 18:02:10 2013 -0700 files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c description: Issue #16475: Support object instancing, recursion and interned strings in marshal --- This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers. For real data, here are interesting numbers: http://bugs.python.org/issue16475#msg176013 Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.
msg209534 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2014年01月28日 11:35
For the record, format 3 was added through issue16475, format 4 was added through issue19219. In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact. OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not. As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.
msg209535 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2014年01月28日 11:39
> Integers, without the patch: > > dumps v3: 62.8 ms > data size v3: 4882.8 kB > loads v3: 10.7 ms > > Integers, with the patch: > > dumps v3: 18.6 ms (-70%) > data size v3: 4882.8 kB (same size) > loads v3: 27.7 ms (+158%) As I wrote on python-dev, dumps performance isn't important for the pyc use case, but loads performance is. Therefore it appears this patch goes into the wrong direction. You are also ignoring the runtime benefit of sharing objects: smaller memory footprint of the actual Python process.
msg211164 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年02月13日 19:29
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This is implementation detail. But we can use more efficient way to memoizating small int objects. I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.
msg211173 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年02月13日 21:35
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.
msg211691 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2014年02月20日 06:05
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.
msg234903 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年01月28日 16:45
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id). Without the patch: data ver. dumps(ms) loads(ms) size(KiB) genData 2 103.9 186.4 4090.7 genData 3 250.3 196.8 4090.7 genData 4 223.5 182.5 3651.3 [1000]106 2 98.6 134.8 4882.8 [1000]10*6 3 491.1 62.2 4882.8 [1000]10*6 4 494.9 62.1 4882.8 [1000.0]10*6 2 173.5 158.4 8789.1 [1000.0]10*6 3 494.8 62.2 4882.8 [1000.0]10*6 4 491.4 62.8 4882.8 [1000.0j]10*6 2 288.8 221.4 16601.6 [1000.0j]10*6 3 493.6 62.4 4882.8 [1000.0j]10*6 4 489.2 62.0 4882.8 20 pydecimals 2 85.0 114.7 3936.5 20 pydecimals 3 97.2 44.3 3373.4 20 pydecimals 4 86.2 40.0 3297.5 With marshal3_numbers.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 108.4 187.5 4090.7 genData 4 83.0 179.3 3651.3 [1000]10*6 3 104.2 145.8 4882.8 [1000]10*6 4 103.9 147.0 4882.8 [1000.0]10*6 3 177.4 154.5 8789.1 [1000.0]10*6 4 177.1 164.2 8789.1 [1000.0j]10*6 3 501.6 61.1 4882.8 [1000.0j]10*6 4 501.6 62.3 4882.8 20 pydecimals 3 95.2 41.9 3373.4 20 pydecimals 4 83.5 38.5 3297.5 With marshal_refs_by_value.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 150.4 197.0 4090.7 genData 4 122.1 184.8 3651.3 [1000]10*6 3 169.1 62.3 4882.8 [1000]10*6 4 169.2 62.2 4882.8 [1000.0]10*6 3 313.5 62.2 4882.8 [1000.0]10*6 4 312.8 62.3 4882.8 [1000.0j]10*6 3 410.6 62.5 4882.8 [1000.0j]10**6 4 410.5 62.3 4882.8 20 pydecimals 3 68.5 41.1 3373.4 20 pydecimals 4 57.5 39.8 3297.5 The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster. As side effect the patch can turn some simple equal values to identical objects.
msg235050 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年01月30日 19:43
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests). https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py $ find * -name __pycache__ -exec rm -rf '{}' + $ ./python -m compileall -qq -r 99 Lib $ find Lib -name '*.pyc' \| xargs ./python marshalbench.py Without the patch: ver. dumps loads size 770.3 19941.2 0 695.7 1178.4 19941.2 1 680.4 1180.9 19941.2 2 635.9 1115.9 19941.2 3 896.3 757.3 19941.2 4 748.0 700.6 19941.2 With marshal3_numbers.patch: ver. dumps loads size 750.5 19946.1 0 690.2 1173.7 19946.1 1 678.2 1166.6 19946.1 2 630.9 1105.2 19946.1 3 873.2 751.5 19946.1 4 724.6 687.6 19946.1 With marshal_refs_by_value.patch: ver. dumps loads size 780.2 19935.8 0 712.7 1202.6 19935.8 1 713.8 1195.1 19935.8 2 676.5 1126.0 19935.8 3 686.1 762.7 19935.8 4 515.1 697.6 19935.8
msg235384 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月04日 11:16
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.
msg235385 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月04日 11:30
And here is alternative patch which uses a hashtable. Both patches have about the same performance for .pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]10*6, [1000.0]10*6 and [1000.0j]10*6 with version 3 an 4 is so fast as marshalling [1000]10*6 with version 2 (i.e. 5 times faster than current implementation). data ver. dumps(ms) loads(ms) size(KiB) genData 2 99.9 188.9 4090.7 genData 3 148.2 189.1 4090.7 genData 4 121.4 177.4 3651.3 [1000]10*6 2 97.7 131.6 4882.8 [1000]10*6 3 95.1 63.1 4882.8 [1000]10*6 4 95.1 64.4 4882.8 [1000.0]10*6 2 172.9 153.5 8789.1 [1000.0]10*6 3 97.4 61.9 4882.8 [1000.0]10*6 4 95.7 61.6 4882.8 [1000.0j]10*6 2 288.6 228.2 16601.6 [1000.0j]10*6 3 94.9 61.6 4882.8 [1000.0j]10**6 4 95.1 62.2 4882.8 20 pydecimals 2 88.0 111.4 3929.6 20 pydecimals 3 57.0 51.4 3368.5 20 pydecimals 4 46.6 39.9 3292.8
msg235469 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月06日 07:11
What are your thoughts Victor?
msg235479 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2015年02月06日 12:39
Can you post the results of your hashtable patch with marshalbench.py? PS: we don't care about versions older than 4.
msg235485 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月06日 14:11
Here are new results with corrected marshalbench.py (more precise and with recalculated total size): Without the patch: ver. dumps loads size 746.5 19971.2 0 669.1 1149.2 26202.9 1 682.0 1130.1 26202.9 2 652.0 1134.8 26202.4 3 920.8 757.9 21316.7 4 771.2 691.4 19971.2 With marshal3_numbers.patch: 3 894.6 778.4 21321.6 4 733.1 704.2 19976.2 With marshal_refs_by_value_3.patch: 3 687.6 777.7 21310.3 4 535.5 701.5 19966.3 With marshal_hashtable.patch: 3 656.0 764.2 21316.7 4 512.9 696.6 19971.2
msg235713 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月10日 22:00
Update patch addresses Victor's comments.
msg235721 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2015年02月11日 00:09
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).
msg235722 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2015年02月11日 00:09
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)
msg235746 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2015年02月11日 13:55
New changeset 012364df2712 by Serhiy Storchaka in branch 'default': Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on https://hg.python.org/cpython/rev/012364df2712
msg235753 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年02月11日 15:53
Thanks for your review Antoine and Victor.

History
Date	User	Action	Args
2022年04月11日 14:57:57	admin	set	github: 64615
2015年02月11日 15:53:15	serhiy.storchaka	set	status: open -> closed messages: + msg235753 assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved
2015年02月11日 13:55:58	python-dev	set	nosy: + python-dev messages: + msg235746
2015年02月11日 00:09:58	vstinner	set	messages: + msg235722
2015年02月11日 00:09:12	vstinner	set	messages: + msg235721
2015年02月10日 22:00:54	serhiy.storchaka	set	files: + marshal_hashtable_2.patch messages: + msg235713
2015年02月06日 14:11:15	serhiy.storchaka	set	messages: + msg235485
2015年02月06日 12:39:25	pitrou	set	messages: + msg235479
2015年02月06日 07:11:58	serhiy.storchaka	set	messages: + msg235469 title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4
2015年02月04日 11:30:45	serhiy.storchaka	set	files: + marshal_hashtable.patch messages: + msg235385
2015年02月04日 11:16:49	serhiy.storchaka	set	files: + marshal_refs_by_value_3.patch messages: + msg235384
2015年01月30日 19:43:35	serhiy.storchaka	set	messages: + msg235050
2015年01月28日 16:46:03	serhiy.storchaka	set	files: + bench_issue20416.py
2015年01月28日 16:45:22	serhiy.storchaka	set	files: + marshal_refs_by_value.patch messages: + msg234903 components: + Interpreter Core stage: patch review
2014年02月20日 06:05:12	larry	set	priority: release blocker -> normal messages: + msg211691 versions: + Python 3.5, - Python 3.4
2014年02月13日 21:35:45	serhiy.storchaka	set	files: + marshal_small_ints_refs.patch messages: + msg211173
2014年02月13日 19:29:29	serhiy.storchaka	set	messages: + msg211164
2014年01月28日 11:39:26	pitrou	set	messages: + msg209535
2014年01月28日 11:35:30	loewis	set	messages: + msg209534
2014年01月28日 11:34:43	vstinner	set	messages: + msg209533
2014年01月28日 10:54:50	serhiy.storchaka	set	messages: + msg209532
2014年01月28日 10:50:55	serhiy.storchaka	set	nosy: + pitrou, kristjan.jonsson, serhiy.storchaka
2014年01月28日 10:36:08	vstinner	set	priority: normal -> release blocker nosy: + larry messages: + msg209531
2014年01月28日 10:25:57	vajrasky	set	nosy: + vajrasky messages: + msg209530
2014年01月28日 10:20:21	vstinner	set	messages: + msg209528
2014年01月28日 10:19:31	vstinner	set	messages: + msg209526
2014年01月28日 10:14:45	vstinner	set	nosy: + loewis
2014年01月28日 10:14:40	vstinner	set	files: + bench.py messages: + msg209525
2014年01月28日 10:10:48	vstinner	create

homepage