This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018年07月03日 15:46 by vstinner, last changed 2022年04月11日 14:59 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 8057 | closed | vstinner, 2018年07月03日 15:47 | |
| PR 8226 | open | methane, 2018年07月10日 12:23 | |
| Messages (9) | |||
|---|---|---|---|
| msg320988 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月03日 15:46 | |
Follow up of bpo-29708: OpenSUSE uses a downstream patch for distutils to fix https://bugzilla.opensuse.org/show_bug.cgi?id=1049186: distutils-reproducible-compile.patch. I converted the patch as a PR: PR 8057. Naoki INADA wrote: """ Currently, marshal uses refcnt to determine using w_ref or not. Some immutable objects (especially, long and str) can be cached and reused. It may affects refcnt when byte compiling. I think we should use more deterministic way instead of refcnt. Maybe, count all constants in the module before marshal, like we did in compiling function for co_consts and co_names. As a bonus, it may reduce resource usage too by merging constants over functions. (e.g. ('self',) co_varnames and (None,) co_consts) """ https://github.com/python/cpython/pull/8057#issuecomment-402065657 Serhiy Storchaka added: """ I think we need to understand the issue better before committing changes. When found the source of unstability of file names, we can find other similar sources and make them stable too. For example if the source is listdir() or glob(), we can consider sorting results of all listdir() or glob() in distutils and related methods. On other side, if the problem is with reference counters in marshal, we can change the marshal module instead. """ https://github.com/python/cpython/pull/8057#issuecomment-402198390 |
|||
| msg320990 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月03日 15:47 | |
Copy of https://bugzilla.opensuse.org/show_bug.cgi?id=1049186 first message: """ e.g. python-simplejson has one-bit diffs in .pyc files See http://rb.zq1.de/compare.factory-20170713/python-simplejson-compare.out in python3-simplejson.rpm we get -00004e50 68 6f 72 5f 5f da 07 64 65 63 69 6d 61 6c 72 0c |hor__..decimalr.| +00004e50 68 6f 72 5f 5f 5a 07 64 65 63 69 6d 61 6c 72 0c |hor__Z.decimalr.| in python3-simplejson-test.rpm we get the opposite change -00000580 72 13 00 00 00 5a 07 64 65 63 69 6d 61 6c 72 03 |r....Z.decimalr.| +00000580 72 13 00 00 00 da 07 64 65 63 69 6d 61 6c 72 03 |r......decimalr.| and it seems to be related to filesystem ordering, since it built reproducibly when using a filesystem with sorted readdir using disorderfs via reproducible-faketools-filesys from https://build.opensuse.org/package/show/home:bmwiedemann:reproducible/reproducible-faketools """ https://bugzilla.opensuse.org/show_bug.cgi?id=1049186#c0 |
|||
| msg320991 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2018年07月03日 15:50 | |
I agree that we should fix the underlying issue (marshal) rather than papering over it by sorting. In fact, we should have a test that compiles a bunch of pycs in a random orders and sees if they're the same or not. |
|||
| msg321383 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月10日 12:14 | |
Is this issue for only known marshal issue? Or is this issue for all issues in distutils including unknowns? |
|||
| msg321408 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2018年07月11日 04:39 | |
We should probably discuss the marshal issue in the preëxisting #31377. I'm not sure if "distutils is not reproducible" is a larger issue than "pyc compilation is not reproducible". This issue could be a meta issue for either. |
|||
| msg321432 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年07月11日 10:33 | |
> Is this issue for only known marshal issue? IMHO the order in which .pyc files are created on disk also matters. It changes the result of "os.listdir()": some application can rely on unsorted os.listdir(). sorted() seems simple and hardless compared to the benefit. |
|||
| msg321434 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2018年07月11日 10:37 | |
OK, I created sub issue for pyc. |
|||
| msg337975 - (view) | Author: Bernhard M. Wiedemann (bmwiedemann) * | Date: 2019年03月15日 08:58 | |
unreproducible .pyc files are still one of the major headaches for my work on openSUSE reproducible builds. There is also one aspect where i586 builds end up with different .pyc files than x86_64 builds. And then we randomly chose one of them for our "noarch" python module packages and hope they work everywhere (including on arm and s390 architectures). So is someone working towards a concept that makes it is possible to create the same .pyc files anywhere? Can I help something there? Is there an ETA? |
|||
| msg359595 - (view) | Author: Petr Viktorin (petr.viktorin) * (Python committer) | Date: 2020年01月08日 14:05 | |
> There is also one aspect where i586 builds end up with different .pyc files than x86_64 builds. And then we randomly chose one of them for our "noarch" python module packages and hope they work everywhere (including on arm and s390 architectures). They are functionally identical, despite not being bit-by-bit identical. If they do not work everywhere, it's a very serious bug. > So is someone working towards a concept that makes it is possible to create the same .pyc files anywhere? No, it's a known issue no one is working on. > Can I help something there? Maybe? The two main culprits are in the marshal serialization algorithm: https://github.com/python/cpython/blob/master/Python/marshal.c Specifically: - a heuristic depends on refcount (i.e. state of objects in the entire interpreter, rather than just relationships between serialized objects): https://github.com/python/cpython/blob/33b671e72450bf4b5a946ce0dde6b7fe21150108/Python/marshal.c#L304 - (frozen)sets are serialized in iteration order, which is unpredictable (and determinig a predictable order is not trivial): https://github.com/python/cpython/blob/33b671e72450bf4b5a946ce0dde6b7fe21150108/Python/marshal.c#L498 A solution will probably come with an unacceptable performance hit -- it's good to keep generating the .pyc files fast. Two options to overcome that come to mind: - make reproducibility optional (which would make the testing more cumbersome) - make an add-on tool to re-serialize an existing .pyc. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:02 | admin | set | github: 78214 |
| 2020年04月10日 13:23:09 | yan12125 | set | nosy:
+ yan12125 |
| 2020年04月08日 12:50:37 | jefferyto | set | nosy:
+ jefferyto |
| 2020年02月24日 16:35:26 | mcepl | set | nosy:
- mcepl |
| 2020年01月08日 14:05:06 | petr.viktorin | set | nosy:
+ petr.viktorin messages: + msg359595 |
| 2019年03月15日 08:58:26 | bmwiedemann | set | nosy:
+ bmwiedemann messages: + msg337975 |
| 2019年03月06日 15:46:44 | zbysz | set | nosy:
+ zbysz |
| 2018年11月13日 13:29:54 | sascha_silbe | set | nosy:
+ sascha_silbe |
| 2018年07月11日 10:37:20 | methane | set | dependencies:
+ remove *_INTERNED opcodes from marshal, Reproducible pyc: FLAG_REF is not stable. messages: + msg321434 |
| 2018年07月11日 10:33:20 | vstinner | set | messages: + msg321432 |
| 2018年07月11日 04:39:09 | benjamin.peterson | set | messages: + msg321408 |
| 2018年07月10日 12:23:27 | methane | set | pull_requests: + pull_request7764 |
| 2018年07月10日 12:14:36 | methane | set | nosy:
+ methane messages: + msg321383 |
| 2018年07月04日 23:27:10 | mcepl | set | nosy:
+ mcepl |
| 2018年07月03日 15:50:22 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg320991 |
| 2018年07月03日 15:47:56 | vstinner | set | messages: + msg320990 |
| 2018年07月03日 15:47:04 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request7677 |
| 2018年07月03日 15:46:25 | vstinner | create | |