This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018年07月11日 10:36 by methane, last changed 2022年04月11日 14:59 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bm_marshal.py | methane, 2018年07月11日 13:07 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 8226 | open | methane, 2018年07月11日 10:37 | |
| PR 8293 | open | methane, 2018年07月16日 07:28 | |
| PR 28379 | open | eric.snow, 2021年09月16日 22:06 | |
| Messages (14) | |||
|---|---|---|---|
| msg321435 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月11日 10:40 | |
PR-8226 makes marshal two-pass. It may have small overhead. In case of compiling module, marshal performance is negligible. But how in other cases? Should this change optional? And should we backport this to Python 3.7? Or should distributors cherrypick this? |
|||
| msg321448 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月11日 13:07 | |
marshal: Mean +- std dev: [master] 123 us +- 7 us -> [patched] 173 us +- 2 us: 1.41x slower (+41%) compile+marshal: Mean +- std dev: [master] 5.28 ms +- 0.02 ms -> [patched] 5.47 ms +- 0.34 ms: 1.04x slower (+4%) |
|||
| msg321521 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2018年07月12日 06:17 | |
Look also at alternate patches for issue20416. Some of them can solve this problem for simple types. If they have better performance, using them for simple types could save a time. But this will complicate a code, and I'm not sure it is worth. |
|||
| msg321523 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月12日 08:00 | |
According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in arbitrary order, and so frozenset serialization is not reproducible: https://mail.python.org/pipermail/python-dev/2018-July/154604.html |
|||
| msg321524 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月12日 08:02 | |
What is the time spent in marshal.dumps() at Python startup when Python has to create all .pyc files? For example "./python -c pass" in the master branch with no external dependency? My question is if the PR makes Python startup 5% slower or less than 1% slower. |
|||
| msg321527 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月12日 08:31 | |
> STINNER Victor <vstinner@redhat.com> added the comment: > > According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in arbitrary order, and so frozenset serialization is not reproducible: > https://mail.python.org/pipermail/python-dev/2018-July/154604.html PYTHONHASHSEED can be used to stable frozenset order. On the other hand, refcnt based approach is more unstable. Even when x is y, dumps(x) == dumps(y) is not guaranteed. |
|||
| msg321528 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月12日 08:34 | |
> STINNER Victor <vstinner@redhat.com> added the comment: > > What is the time spent in marshal.dumps() at Python startup when Python has to create all .pyc files? For example "./python -c pass" in the master branch with no external dependency? My question is if the PR makes Python startup 5% slower or less than 1% slower. When startup, Python does more than compile()+marshal.dumps(). And as I wrote above, it makes compile()+marshal.dumps() only 4% slower. So startup must not be slower than 4%. Additionally, it happens only once if pyc can be writable. (I don't know if marshal.dumps() is called when open(cache_path, 'wb') failed) |
|||
| msg321529 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月12日 08:41 | |
> So startup must not be slower than 4%. I know. But Python does more than compile()+dumps() at the first run. I'm curious if it is feasible to measure this cost. But it may be hard to get reliable benchmarks, since I expect that the difference will be very small, and I know very well that measuring Python startup is hard since it depends a lot of on the filesystem which is hard to measure. |
|||
| msg321611 - (view) | Author: Christian Tismer (Christian.Tismer) * (Python committer) | Date: 2018年07月13日 13:52 | |
Why must this become slower? To my knowledge, many projects prefer marshal over pickle for suitable simple objects because it is so very fast. I would not throw that away: Would it not be easy to add a named optional keyword argument, like "stable=True"? |
|||
| msg321622 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月13日 15:52 | |
> Would it not be easy to add a named optional keyword > argument, like "stable=True"? My pull request did it. But for now, I get hint on ML and overwrote my PR with another way: Use FLAG_REF for all interned strings. |
|||
| msg347970 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年07月15日 15:05 | |
> According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in arbitrary order, and so frozenset serialization is not reproducible: https://mail.python.org/pipermail/python-dev/2018-July/154604.html I created bpo-37596 "Reproducible pyc: frozenset is not serialized in a deterministic order" to track this issue. |
|||
| msg401979 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年09月16日 18:32 | |
FYI, I unknowingly created a duplicate of this issue a few days ago, bpo-45186, and created a PR for it: https://github.com/python/cpython/pull/28379. Interestingly, while I did that PR independently, it has a lot in common with Inada-san's second PR. My interest here is in how frozen modules can be affected by this problem, particularly between debug and non-debug builds. See bpo-45020, where I'm working on freezing all the stdlib modules imported during startup. |
|||
| msg402236 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年09月20日 14:42 | |
It turns out that I don't need this after all (once I merged gh-28392 and bpo-45188 was resolved). That impacts how much time I have to spend on this, so I might not be able to pursue this further. That said, I think it is worth doing and the PR I have up mostly does everything we need here. So I'll see if I can follow this through. :) |
|||
| msg402244 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年09月20日 15:21 | |
FWIW, I found a faster solution than calling `w_object()` twice. Currently the logic for w_ref() (used for each "complex" object) looks like this: * if ob_ref == 1 * do not apply FLAG_REF * marshal normally * else if seen for the first time * apply FLAG_REF * marshal normally * otherwise * emit TYPE_REF * emit the ref index of the first instance The faster solution looks like this: * if seen for the first time * do not apply FLAG_REF * marshal normally * record the index of the type byte in the output stream * else if seen for a second time * apply FLAG_REF to the byte at the earlier-recorded position * emit TYPE_REF * emit the ref index of the first instance * otherwise * emit TYPE_REF * emit the ref index of the first instance While this is faster, there are two downsides: extra memory usage and it isn't practical when writing to a file. However, I don't think either is a significant problem. For the former, it can be mostly mitigated by using the negative values in WFILE.hashtable to store the type byte position. For the latter, "marshal.dump()" is already a light wrapper around "marshal.dump()" and for PyMarshal_WriteObjectToFile() we simply stick with the current unstable approach (or change it to do what "marshal.dump()" does). FYI, I mostly have that implemented in a branch, but am not sure when I'll get back to it. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:03 | admin | set | github: 78274 |
| 2021年09月20日 15:21:12 | eric.snow | set | messages: + msg402244 |
| 2021年09月20日 14:42:51 | eric.snow | set | messages: + msg402236 |
| 2021年09月16日 22:06:30 | eric.snow | set | pull_requests: + pull_request26811 |
| 2021年09月16日 18:33:16 | eric.snow | link | issue45186 superseder |
| 2021年09月16日 18:32:28 | eric.snow | set | versions:
+ Python 3.11, - Python 3.8 nosy: + eric.snow messages: + msg401979 components: + Interpreter Core, - Extension Modules type: behavior |
| 2021年09月05日 02:15:29 | yan12125 | set | nosy:
+ yan12125 |
| 2021年02月03日 18:09:23 | steve.dower | unlink | issue29708 dependencies |
| 2020年12月31日 09:06:54 | methane | link | issue29708 dependencies |
| 2019年07月15日 15:05:42 | vstinner | set | messages: + msg347970 |
| 2018年07月16日 07:28:09 | methane | set | stage: patch review pull_requests: + pull_request7827 |
| 2018年07月13日 15:52:08 | methane | set | messages: + msg321622 |
| 2018年07月13日 13:52:50 | Christian.Tismer | set | nosy:
+ Christian.Tismer messages: + msg321611 |
| 2018年07月12日 08:41:22 | vstinner | set | messages: + msg321529 |
| 2018年07月12日 08:34:42 | methane | set | messages: + msg321528 |
| 2018年07月12日 08:31:29 | methane | set | messages: + msg321527 |
| 2018年07月12日 08:02:49 | vstinner | set | messages: + msg321524 |
| 2018年07月12日 08:00:40 | vstinner | set | nosy:
+ vstinner messages: + msg321523 |
| 2018年07月12日 06:17:08 | serhiy.storchaka | set | messages: + msg321521 |
| 2018年07月12日 05:29:02 | methane | set | nosy:
+ benjamin.peterson, serhiy.storchaka |
| 2018年07月11日 13:07:26 | methane | set | files:
+ bm_marshal.py messages: + msg321448 |
| 2018年07月11日 10:40:08 | methane | set | messages:
+ msg321435 stage: patch review -> (no value) |
| 2018年07月11日 10:37:58 | methane | set | keywords:
+ patch stage: patch review pull_requests: + pull_request7779 |
| 2018年07月11日 10:37:20 | methane | link | issue34033 dependencies |
| 2018年07月11日 10:36:33 | methane | create | |