homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: large memory overhead when pyc is recompiled
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: asottile, benjamin.peterson, brett.cannon, bukzor, geoffreyspear, georg.brandl, jonathan.underwood, methane, ncoghlan, pitrou, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2015年04月30日 18:59 by bukzor, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
repro.py bukzor, 2015年04月30日 18:59 repro.py, from the demo in description
repro2.py asottile, 2015年05月01日 15:59
anon_city_hoods.tar.gz asottile, 2015年05月01日 16:03
Messages (24)
msg242281 - (view) Author: Buck Evan (bukzor) * Date: 2015年04月30日 18:59
In the attached example I show that there's a significant memory overhead present whenever a pre-compiled pyc is not present.
This only occurs with more than 5225 objects (dictionaries in this case)
allocated. At 13756 objects, the mysterious pyc overhead is 50% of memory
usage.
I've reproduced this issue in python 2.6, 2.7, 3.4. I imagine it's present in all cpythons.
$ python -c 'import repro'
16736
$ python -c 'import repro'
8964
$ python -c 'import repro'
8964
$ rm *.pyc; python -c 'import repro'
16740
$ rm *.pyc; python -c 'import repro'
16736
$ rm *.pyc; python -c 'import repro'
16740
msg242282 - (view) Author: Buck Evan (bukzor) * Date: 2015年04月30日 19:01
Also, we've reproduced this in both linux and osx.
msg242284 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年04月30日 19:34
This is transitory memory consumption. Once the source is compiled to bytecode, memory consumption falls down to its previous level. Do you care that much about it?
msg242296 - (view) Author: Anthony Sottile (asottile) * Date: 2015年05月01日 00:47
Adding `import gc; gc.collect()` doesn't change the outcome afaict
msg242301 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 10:40
> Adding `import gc; gc.collect()` doesn't change the outcome afaict
Of course it doesn't. The memory has already been released.
"ru_maxrss" is the maximum memory consumption during the whole process lifetime. Add the following at the end of your script (Linux):
import os, re, resource
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
with open("/proc/%d/status" % os.getpid(), "r") as f:
 for line in f:
 if line.split(':')[0] in ('VmHWM', 'VmRSS'):
 print(line.strip())
And you'll see that VmRSS has already fallen back to the same level as when the pyc is not recompiled (it's a little bit more, perhaps due to fragmentation):
$ rm -r __pycache__/; ./python -c "import repro"
19244
VmHWM:	 19244 kB
VmRSS:	 12444 kB
$ ./python -c "import repro"
12152
VmHWM:	 12152 kB
VmRSS:	 12152 kB
("VmHWM" - the HighWater Mark - is the same as ru_maxrss)
msg242324 - (view) Author: Anthony Sottile (asottile) * Date: 2015年05月01日 14:37
I'm still seeing a very large difference:
asottile@work:/tmp$ python repro.py 
ready
<module 'city_hoods' from '/tmp/city_hoods.pyc'>
72604
VmHWM:	 72604 kB
VmRSS:	 60900 kB
asottile@work:/tmp$ rm *.pyc; python repro.py 
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
1077232
VmHWM:	 1077232 kB
VmRSS:	 218040 kB
This file is significantly larger than the one attached, not sure if it makes much of a difference.
msg242327 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 15:32
Which Python version is that? Can you try with 3.4 or 3.5?
(is it under GNU/Linux?)
> This file is significantly larger than the one attached, not sure
> if it makes much of a difference.
Python doesn't make a difference internally, but perhaps it has some impact on your OS' memory management.
msg242328 - (view) Author: Anthony Sottile (asottile) * Date: 2015年05月01日 15:39
3.4 seems happier:
asottile@work:/tmp$ rm *.pyc; python3.4 repro.py
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
77472
VmHWM:	 77472 kB
VmRSS:	 65228 kB
asottile@work:/tmp$ python3.4 repro.py
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
77472
VmHWM:	 77472 kB
VmRSS:	 65232 kB
The nasty result above is from 2.7:
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
3.3 also seems to have the same exaggerated problem:
$ rm *.pyc -f; python3.3 repro.py
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
1112996
VmHWM:	 1112996 kB
VmRSS:	 133468 kB
asottile@work:/tmp$ python3.3 repro.py
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
81392
VmHWM:	 81392 kB
VmRSS:	 69304 kB
$ python3.3
Python 3.3.6 (default, Jan 28 2015, 17:27:09) 
[GCC 4.8.2] on linux
So seems the leaky behaviour was fixed at some point, any ideas of what change fixed it and is there a possibility of backporting it to 2.7?
msg242329 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 15:40
Note under 3.x, you need to "rm -r __pycache__", not "rm *.pyc", since the pyc files are now stored in the __pycache__ subdirectory.
msg242330 - (view) Author: Anthony Sottile (asottile) * Date: 2015年05月01日 15:42
Ah, then 3.4 still has the problem:
$ rm -rf __pycache__/ *.pyc; python3.4 repro.py
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
1112892
VmHWM:	 1112892 kB
VmRSS:	 127196 kB
asottile@work:/tmp$ python3.4 repro.py 
ready
<module 'city_hoods' from '/tmp/city_hoods.py'>
77468
VmHWM:	 77468 kB
VmRSS:	 65228 kB
msg242331 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 15:47
Is there any chance you can upload a script that's large enough to exhibit the problem?
(perhaps with anonymized data if there's something sensitive in there)
msg242332 - (view) Author: Anthony Sottile (asottile) * Date: 2015年05月01日 15:59
Attached is repro2.py (slightly different so my editor doesn't hate itself when editing the file)
I'll attach the other file in another comment since it seems I can only do one at a time
msg242339 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 17:31
Ok, I can reproduce:
$ rm -r __pycache__/; ./python repro2.py 
ready
<module 'anon_city_hoods' from '/home/antoine/cpython/opt/anon_city_hoods.py'>
1047656
VmHWM:	 1047656 kB
VmRSS:	 50660 kB
$ ./python repro2.py 
ready
<module 'anon_city_hoods' from '/home/antoine/cpython/opt/anon_city_hoods.py'>
77480
VmHWM:	 77480 kB
VmRSS:	 15664 kB
My guess is that memory fragmentation prevents the RSS mark to drop any further, though one cannot rule out the possibility of an actual memory leak.
msg242340 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年05月01日 17:32
(by the way, my numbers are with Python 3.5 - the in-development version - on 64-bit Linux)
msg242351 - (view) Author: Buck Evan (bukzor) * Date: 2015年05月01日 20:32
New data: The memory consumption seems to be in the compiler rather than the marshaller:
```
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032
$ python -c 'import repro'
16032
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
```
We were trying to use PYTHONDONTWRITEBYTECODE as a workaround to this issue, but it didn't help us because of this.
msg242379 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年05月02日 05:29
The use of PYTHONDONTWRITEBYTECODE is not a workaround because it makes your to have memory overhead unconditionally. The compiler needs more momory than require compiled data itself. If this is an issue, I suggest to use different representation for the data: JSON, pickle, or just marshal. Also it may be faster. Try also CSV or custom simple format if it is appropriate.
msg242583 - (view) Author: Buck Evan (bukzor) * Date: 2015年05月04日 21:31
@serhiy.storchaka This is a very stable piece of a legacy code base, so we're not keen to refactor it so dramatically, although we could. 
We've worked around this issue by compiling pyc files ahead of time and taking extra care that they're preserved through deployment. This isn't blocking our 2.7 transition anymore.
msg318505 - (view) Author: Jonathan G. Underwood (jonathan.underwood) Date: 2018年06月02日 16:37
Seeing a very similar problem - very high memory useage during byte compilation.
Consider the very simple code in a file:
```
def test_huge():
 try:
 huge = b'0円' * 0x100000000 # this allocates 4GB of memory!
 except MemoryError:
 print('OOM')
```
Running this sequence of commands shows that during byte compilation, 4 GB memory is used. Presumably this is because of the `huge` object - note of course the function isn't actually executed.
```
valgrind --tool=massif python memdemo.py
ms_print massif.out.7591 | less
```
You'll need to replace 7591 with whatever process number valgrind reports.
Is there any hope of fixing this? It's currently a problem for me when running tests on Travis, where the memory limit is 3GB. I had hoped to use a conditional like the above to skip tests that would require more memory than is available. However, the testing is killed before that simply because the byte compilation is causing an OOM.
msg318507 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018年06月02日 18:29
That's presumably due to the compile-time constant-expression optimization. Have you tried bytes(0x1000000)? I don't think that gets treated as a constant by the optimizer (but I could be wrong since a bunch of things ahve been added to it lately).
msg318508 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年06月02日 18:31
Jonathan, this is a different problem, and it is fixed in 3.6+ (see issue21074).
msg318509 - (view) Author: Jonathan G. Underwood (jonathan.underwood) Date: 2018年06月02日 18:45
Thanks to both Serhiy Storchaka and David Murray - indeed you're both correct, and that is the issue in 21074, and the workaround from there of declaring a variable for that size fixes the problem.
msg320980 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018年07月03日 13:12
In case repro2, unreturned memory is in glibc malloc.
jemalloc mitigates this issue.
There are some fragmentation in pymalloc, but I think it's acceptable level.
$ python3 -B repro2.py
ready
<module 'anon_city_hoods' from '/home/inada-n/anon_city_hoods.py'>
1079124
VmHWM: 1079124 kB
VmRSS: 83588 kB
$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 python3 -B repro2.py
ready
<module 'anon_city_hoods' from '/home/inada-n/anon_city_hoods.py'>
1108424
VmHWM: 1108424 kB
VmRSS: 28140 kB
msg320981 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018年07月03日 13:26
since anon_city_hoods has massive constants, compiler_add_const makes dict larger and larger. It creates many large tuples too.
I suspect it makes glibc malloc unhappy.
Maybe, we can improve pymalloc for medium and large objects, by porting strategy from jemalloc. It can be good GSoC project.
But I suggest close this issue as "won't fix" for now.
msg320984 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年07月03日 14:41
VmRSS for different versions:
 malloc jmalloc
2.7: 237316 kB 90524 kB
3.4: 53888 kB 14768 kB
3.5: 51396 kB 14908 kB
3.6: 90692 kB 31776 kB
3.7: 130952 kB 28296 kB
3.8: 130284 kB 27644 kB
History
Date User Action Args
2022年04月11日 14:58:16adminsetgithub: 68273
2018年07月27日 09:47:19methanesetstatus: open -> closed
resolution: wont fix
stage: resolved
2018年07月03日 14:41:22serhiy.storchakasetmessages: + msg320984
2018年07月03日 13:26:01methanesetmessages: + msg320981
2018年07月03日 13:12:33methanesetnosy: + methane
messages: + msg320980
2018年06月02日 18:45:38jonathan.underwoodsetmessages: + msg318509
2018年06月02日 18:31:30serhiy.storchakasetmessages: + msg318508
2018年06月02日 18:29:02r.david.murraysetnosy: + r.david.murray
messages: + msg318507
2018年06月02日 16:37:12jonathan.underwoodsetnosy: + jonathan.underwood
messages: + msg318505
2015年05月04日 21:31:13bukzorsetmessages: + msg242583
2015年05月02日 05:29:37serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg242379
2015年05月01日 23:49:42pitrousetnosy: + brett.cannon, georg.brandl, ncoghlan, benjamin.peterson
2015年05月01日 20:32:11bukzorsetmessages: + msg242351
2015年05月01日 17:51:30geoffreyspearsetnosy: + geoffreyspear

type: resource usage
components: + Interpreter Core
versions: + Python 3.5
2015年05月01日 17:32:41pitrousetmessages: + msg242340
2015年05月01日 17:31:44pitrousetmessages: + msg242339
2015年05月01日 16:03:09asottilesetfiles: + anon_city_hoods.tar.gz
2015年05月01日 15:59:20asottilesetfiles: + repro2.py

messages: + msg242332
2015年05月01日 15:47:39pitrousetmessages: + msg242331
2015年05月01日 15:42:21asottilesetmessages: + msg242330
2015年05月01日 15:40:46pitrousetmessages: + msg242329
2015年05月01日 15:39:09asottilesetmessages: + msg242328
2015年05月01日 15:32:39pitrousetmessages: + msg242327
2015年05月01日 14:37:44asottilesetmessages: + msg242324
2015年05月01日 10:40:39pitrousetmessages: + msg242301
2015年05月01日 00:47:14asottilesetnosy: + asottile
messages: + msg242296
2015年04月30日 19:34:20pitrousetnosy: + pitrou
messages: + msg242284
2015年04月30日 19:01:31bukzorsetmessages: + msg242282
2015年04月30日 18:59:04bukzorcreate

AltStyle によって変換されたページ (->オリジナル) /