homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Marshal: performance regression with versions 3 and 4
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: kristjan.jonsson, larry, loewis, pitrou, python-dev, serhiy.storchaka, vajrasky, vstinner
Priority: normal Keywords: patch

Created on 2014年01月28日 10:10 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
marshal3_numbers.patch vstinner, 2014年01月28日 10:10 review
bench.py vstinner, 2014年01月28日 10:14
marshal_small_ints_refs.patch serhiy.storchaka, 2014年02月13日 21:35 review
marshal_refs_by_value.patch serhiy.storchaka, 2015年01月28日 16:45 review
bench_issue20416.py serhiy.storchaka, 2015年01月28日 16:46
marshal_refs_by_value_3.patch serhiy.storchaka, 2015年02月04日 11:16 review
marshal_hashtable.patch serhiy.storchaka, 2015年02月04日 11:30 review
marshal_hashtable_2.patch serhiy.storchaka, 2015年02月10日 22:00 review
Messages (25)
msg209524 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 10:10
Attached patched disables references for int and float types.
msg209525 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 10:14
Use attached bench.py to compare performances.
Without the patch:
---
dumps v0: 389.6 ms
data size v0: 45582.9 kB
loads v0: 573.3 ms
dumps v1: 391.4 ms
data size v1: 45582.9 kB
loads v1: 558.0 ms
dumps v2: 166.9 ms
data size v2: 41395.4 kB
loads v2: 482.2 ms
dumps v3: 431.2 ms
data size v3: 41395.4 kB
loads v3: 543.8 ms
dumps v4: 361.8 ms
data size v4: 37000.9 kB
loads v4: 560.4 ms
---
With the patch:
---
dumps v0: 391.4 ms
data size v0: 45582.9 kB
loads v0: 578.2 ms
dumps v1: 392.3 ms
data size v1: 45582.9 kB
loads v1: 556.8 ms
dumps v2: 167.7 ms
data size v2: 41395.4 kB
loads v2: 484.6 ms
dumps v3: 170.3 ms
data size v3: 41395.4 kB
loads v3: 467.0 ms
dumps v4: 122.8 ms
data size v4: 37000.9 kB
loads v4: 468.9 ms
---
dumps v3 is 60% faster, loads v3 is also 14% *faster*.
dumps v4 is 66% faster, loads v4 is 16% faster.
msg209526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 10:19
Performance of Python 3.3.3+:
---
dumps v0: 374.8 ms
data size v0: 45582.9 kB
loads v0: 625.3 ms
dumps v1: 374.6 ms
data size v1: 45582.9 kB
loads v1: 605.1 ms
dumps v2: 152.9 ms
data size v2: 41395.4 kB
loads v2: 556.5 ms
---
So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller).
msg209528 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 10:20
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.
msg209530 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014年01月28日 10:25
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.
http://bugs.python.org/issue20185 
msg209531 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 10:36
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.
The Derby is suspended until the release of Python 3.4 final.
I consider this issue as an important performance regression that should be fixed before Python 3.4 final.
msg209532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年01月28日 10:54
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
msg209533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年01月28日 11:34
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
Here are new benchmarks on Python 3.4 with:
 Integers: [1000] * 1000000
 Floats: [1000.0] * 1000000
Integers, without the patch:
 dumps v3: 62.8 ms
 data size v3: 4882.8 kB
 loads v3: 10.7 ms
Integers, with the patch:
 dumps v3: 18.6 ms (-70%)
 data size v3: 4882.8 kB (same size)
 loads v3: 27.7 ms (+158%)
Floats, without the patch:
 dumps v3: 62.5 ms
 data size v3: 4882.8 kB
 loads v3: 11.0 ms
Floats, with the patch:
 dumps v3: 29.3 ms (-53%)
 data size v3: 8789.1 kB (+80%)
 loads v3: 25.5 ms (+132%)
The version 3 was added by:
---
changeset: 82816:01372117a5b4
user: Kristján Valur Jónsson <sweskman@gmail.com>
date: Tue Mar 19 18:02:10 2013 -0700
files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c
description:
Issue #16475: Support object instancing, recursion and interned strings in marshal
---
This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers.
For real data, here are interesting numbers:
http://bugs.python.org/issue16475#msg176013
Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.
msg209534 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014年01月28日 11:35
For the record, format 3 was added through issue16475, format 4 was added through issue19219.
In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact.
OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not.
As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.
msg209535 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年01月28日 11:39
> Integers, without the patch:
> 
> dumps v3: 62.8 ms
> data size v3: 4882.8 kB
> loads v3: 10.7 ms
> 
> Integers, with the patch:
> 
> dumps v3: 18.6 ms (-70%)
> data size v3: 4882.8 kB (same size)
> loads v3: 27.7 ms (+158%)
As I wrote on python-dev, dumps performance isn't important for the pyc
use case, but loads performance is. Therefore it appears this patch goes
into the wrong direction.
You are also ignoring the *runtime* benefit of sharing objects: smaller
memory footprint of the actual Python process.
msg211164 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年02月13日 19:29
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway.
This is implementation detail.
But we can use more efficient way to memoizating small int objects.
I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.
msg211173 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年02月13日 21:35
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.
msg211691 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014年02月20日 06:05
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.
msg234903 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年01月28日 16:45
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id).
Without the patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 2 103.9 186.4 4090.7
genData 3 250.3 196.8 4090.7
genData 4 223.5 182.5 3651.3
[1000]*10**6 2 98.6 134.8 4882.8
[1000]*10**6 3 491.1 62.2 4882.8
[1000]*10**6 4 494.9 62.1 4882.8
[1000.0]*10**6 2 173.5 158.4 8789.1
[1000.0]*10**6 3 494.8 62.2 4882.8
[1000.0]*10**6 4 491.4 62.8 4882.8
[1000.0j]*10**6 2 288.8 221.4 16601.6
[1000.0j]*10**6 3 493.6 62.4 4882.8
[1000.0j]*10**6 4 489.2 62.0 4882.8
20 pydecimals 2 85.0 114.7 3936.5
20 pydecimals 3 97.2 44.3 3373.4
20 pydecimals 4 86.2 40.0 3297.5
With marshal3_numbers.patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 3 108.4 187.5 4090.7
genData 4 83.0 179.3 3651.3
[1000]*10**6 3 104.2 145.8 4882.8
[1000]*10**6 4 103.9 147.0 4882.8
[1000.0]*10**6 3 177.4 154.5 8789.1
[1000.0]*10**6 4 177.1 164.2 8789.1
[1000.0j]*10**6 3 501.6 61.1 4882.8
[1000.0j]*10**6 4 501.6 62.3 4882.8
20 pydecimals 3 95.2 41.9 3373.4
20 pydecimals 4 83.5 38.5 3297.5
With marshal_refs_by_value.patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 3 150.4 197.0 4090.7
genData 4 122.1 184.8 3651.3
[1000]*10**6 3 169.1 62.3 4882.8
[1000]*10**6 4 169.2 62.2 4882.8
[1000.0]*10**6 3 313.5 62.2 4882.8
[1000.0]*10**6 4 312.8 62.3 4882.8
[1000.0j]*10**6 3 410.6 62.5 4882.8
[1000.0j]*10**6 4 410.5 62.3 4882.8
20 pydecimals 3 68.5 41.1 3373.4
20 pydecimals 4 57.5 39.8 3297.5
The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster.
As side effect the patch can turn some simple equal values to identical objects.
msg235050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年01月30日 19:43
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests).
https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py
$ find * -name __pycache__ -exec rm -rf '{}' +
$ ./python -m compileall -qq -r 99 Lib
$ find Lib -name '*.pyc' | xargs ./python marshalbench.py
Without the patch:
ver. dumps loads size
 770.3 19941.2
0 695.7 1178.4 19941.2
1 680.4 1180.9 19941.2
2 635.9 1115.9 19941.2
3 896.3 757.3 19941.2
4 748.0 700.6 19941.2
With marshal3_numbers.patch:
ver. dumps loads size
 750.5 19946.1
0 690.2 1173.7 19946.1
1 678.2 1166.6 19946.1
2 630.9 1105.2 19946.1
3 873.2 751.5 19946.1
4 724.6 687.6 19946.1
With marshal_refs_by_value.patch:
ver. dumps loads size
 780.2 19935.8
0 712.7 1202.6 19935.8
1 713.8 1195.1 19935.8
2 676.5 1126.0 19935.8
3 686.1 762.7 19935.8
4 515.1 697.6 19935.8
msg235384 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月04日 11:16
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.
msg235385 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月04日 11:30
And here is alternative patch which uses a hashtable.
Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation).
data ver. dumps(ms) loads(ms) size(KiB)
genData 2 99.9 188.9 4090.7
genData 3 148.2 189.1 4090.7
genData 4 121.4 177.4 3651.3
[1000]*10**6 2 97.7 131.6 4882.8
[1000]*10**6 3 95.1 63.1 4882.8
[1000]*10**6 4 95.1 64.4 4882.8
[1000.0]*10**6 2 172.9 153.5 8789.1
[1000.0]*10**6 3 97.4 61.9 4882.8
[1000.0]*10**6 4 95.7 61.6 4882.8
[1000.0j]*10**6 2 288.6 228.2 16601.6
[1000.0j]*10**6 3 94.9 61.6 4882.8
[1000.0j]*10**6 4 95.1 62.2 4882.8
20 pydecimals 2 88.0 111.4 3929.6
20 pydecimals 3 57.0 51.4 3368.5
20 pydecimals 4 46.6 39.9 3292.8
msg235469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月06日 07:11
What are your thoughts Victor?
msg235479 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年02月06日 12:39
Can you post the results of your hashtable patch with marshalbench.py?
PS: we don't care about versions older than 4.
msg235485 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月06日 14:11
Here are new results with corrected marshalbench.py (more precise and with recalculated total size):
Without the patch:
ver. dumps loads size
 746.5 19971.2
0 669.1 1149.2 26202.9
1 682.0 1130.1 26202.9
2 652.0 1134.8 26202.4
3 920.8 757.9 21316.7
4 771.2 691.4 19971.2
With marshal3_numbers.patch:
3 894.6 778.4 21321.6
4 733.1 704.2 19976.2
With marshal_refs_by_value_3.patch:
3 687.6 777.7 21310.3
4 535.5 701.5 19966.3
With marshal_hashtable.patch:
3 656.0 764.2 21316.7
4 512.9 696.6 19971.2
msg235713 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月10日 22:00
Update patch addresses Victor's comments.
msg235721 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年02月11日 00:09
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).
msg235722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年02月11日 00:09
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)
msg235746 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年02月11日 13:55
New changeset 012364df2712 by Serhiy Storchaka in branch 'default':
Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on
https://hg.python.org/cpython/rev/012364df2712 
msg235753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月11日 15:53
Thanks for your review Antoine and Victor.
History
Date User Action Args
2022年04月11日 14:57:57adminsetgithub: 64615
2015年02月11日 15:53:15serhiy.storchakasetstatus: open -> closed
messages: + msg235753

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015年02月11日 13:55:58python-devsetnosy: + python-dev
messages: + msg235746
2015年02月11日 00:09:58vstinnersetmessages: + msg235722
2015年02月11日 00:09:12vstinnersetmessages: + msg235721
2015年02月10日 22:00:54serhiy.storchakasetfiles: + marshal_hashtable_2.patch

messages: + msg235713
2015年02月06日 14:11:15serhiy.storchakasetmessages: + msg235485
2015年02月06日 12:39:25pitrousetmessages: + msg235479
2015年02月06日 07:11:58serhiy.storchakasetmessages: + msg235469
title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4
2015年02月04日 11:30:45serhiy.storchakasetfiles: + marshal_hashtable.patch

messages: + msg235385
2015年02月04日 11:16:49serhiy.storchakasetfiles: + marshal_refs_by_value_3.patch

messages: + msg235384
2015年01月30日 19:43:35serhiy.storchakasetmessages: + msg235050
2015年01月28日 16:46:03serhiy.storchakasetfiles: + bench_issue20416.py
2015年01月28日 16:45:22serhiy.storchakasetfiles: + marshal_refs_by_value.patch

messages: + msg234903
components: + Interpreter Core
stage: patch review
2014年02月20日 06:05:12larrysetpriority: release blocker -> normal

messages: + msg211691
versions: + Python 3.5, - Python 3.4
2014年02月13日 21:35:45serhiy.storchakasetfiles: + marshal_small_ints_refs.patch

messages: + msg211173
2014年02月13日 19:29:29serhiy.storchakasetmessages: + msg211164
2014年01月28日 11:39:26pitrousetmessages: + msg209535
2014年01月28日 11:35:30loewissetmessages: + msg209534
2014年01月28日 11:34:43vstinnersetmessages: + msg209533
2014年01月28日 10:54:50serhiy.storchakasetmessages: + msg209532
2014年01月28日 10:50:55serhiy.storchakasetnosy: + pitrou, kristjan.jonsson, serhiy.storchaka
2014年01月28日 10:36:08vstinnersetpriority: normal -> release blocker
nosy: + larry
messages: + msg209531

2014年01月28日 10:25:57vajraskysetnosy: + vajrasky
messages: + msg209530
2014年01月28日 10:20:21vstinnersetmessages: + msg209528
2014年01月28日 10:19:31vstinnersetmessages: + msg209526
2014年01月28日 10:14:45vstinnersetnosy: + loewis
2014年01月28日 10:14:40vstinnersetfiles: + bench.py

messages: + msg209525
2014年01月28日 10:10:48vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /