[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019年7月09日 01:36:39 -0700

On Tue, Jul 9, 2019 at 9:46 AM Tim Peters <[email protected]> wrote:
>
> > At last, all size classes has1~3 used/cached memory blocks.
>
> No doubt part of it, but hard to believe it's most of it. If the loop
> count above really is 10240, then there's only about 80K worth of
> pointers in the final `buf`.
You are right. List.append is not the major part of memory consumer
of "large" class (8KiB+1 ~ 512KiB). They are several causes of large
size alloc:
* bm_logging uses StringIO.seek(0); StringIO.truncate() to reset buffer.
 So internal buffer of StringIO become Py_UCS4 array instead of a list
 of strings from the 2nd loop. This buffer is using same policy to list
 for increase capacity. `size + size >> 8 + (size < 9 ? 3 : 6)`.
 Actually, when I use `-n 1` option, memory usage is only 9MiB.
* The intern dict.
* Many modules are loaded, and FileIO.readall() is used to read pyc files.
 This creates and deletes various size of bytes objects.
* logging module uses several regular expressions. `b'0円' * 0xff00` is
 used in sre_compile.
 https://github.com/python/cpython/blob/master/Lib/sre_compile.py#L320
>
> But does it really matter? ;-) mimalloc "should have" done MADV_FREE
> on the pages holding the older `buf` instances, so it's not like the
> app is demanding to hold on to the RAM (albeit that it may well show
> up in the app's RSS unless/until the OS takes the RAM away).
>
mimalloc doesn't call madvice for each free(). Each size classes
keeps a 64KiB "page".
And several pages (4KiB) in the "page" are committed but not used.
I dumped all "mimalloc page" stat.
https://paper.dropbox.com/doc/mimalloc-on-CPython--Agg3g6XhoX77KLLmN43V48cfAg-fFyIm8P9aJpymKQN0scpp#:uid=671467140288877659659079&h2=memory-usage-of-logging_format
For example:
bin block_size used capacity reserved
 29 2560 1 22 25 (14 pages are committed, 2560
bytes are in use)
 29 2560 14 25 25 (16 pages are committed,
2560*14 bytes are in use)
 29 2560 11 25 25
 31 3584 1 5 18 (5 pages are committed, 3584
bytes are in use)
 33 5120 1 4 12
 33 5120 2 12 12
 33 5120 2 12 12
 37 10240 3 11 409
 41 20480 1 6 204
 57 327680 1 2 12
* committed pages can be calculated by `ceil(block_size * capacity /
4096)` roughly.
There are dozen of unused memory block and committed pages in each size classes.
This caused 10MiB+ memory usage overhead on logging_format and logging_simple
benchmarks.
>> I was more intrigued by your first (speed) comparison:
>
> > - spectral_norm: 202 ms +- 5 ms -> 176 ms +- 3 ms: 1.15x faster (-13%)
>
> Now _that's_ interesting ;-) Looks like spectral_norm recycles many
> short-lived Python floats at a swift pace. So memory management
> should account for a large part of its runtime (the arithmetic it does
> is cheap in comparison), and obmalloc and mimalloc should both excel
> at recycling mountains of small objects. Why is mimalloc
> significantly faster?
[snip]
> obmalloc's `address_in_range()` is definitely a major overhead in its
> fastest `free()` path, but then mimalloc has to figure out which
> thread is doing the freeing (looks cheaper than address_in_range, but
> not free). Perhaps the layers of indirection that have been wrapped
> around obmalloc over the years are to blame? Perhaps mimalloc's
> larger (16x) pools and arenas let it stay in its fastest paths more
> often? I don't know why, but it would be interesting to find out :-)
Totally agree. I'll investigate this next.
Regards,
-- 
Inada Naoki <[email protected]>
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/MXEE2NOEDAP72RFVTC7H4GJSE2CHP3SX/

Reply via email to