This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年06月18日 12:14 by samthursfield, last changed 2022年04月11日 14:58 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tar-reproducible-testcase.py | samthursfield, 2015年06月18日 12:14 | Testcase for stable tar ordering patch | ||
| tarfile-stable-ordering.patch | samthursfield, 2015年06月18日 12:21 | Patch to fix issue | review | |
| make_archive-stable-ordering.patch | samthursfield, 2015年06月22日 13:19 | Patch to make shutil.make_archive(format='tar') determinstic, but not tar.add(recursive=True) | review | |
| Messages (9) | |||
|---|---|---|---|
| msg245464 - (view) | Author: Sam Thursfield (samthursfield) * | Date: 2015年06月18日 12:14 | |
I want shutil.make_archive() to produce deterministic output when given identical data as inputs. Right now there are two holes in this. One is that mtimes might not match. This can be fixed by the caller. The second is that the order that files in a subdirectory get added to the tarfile is not deterministic. This can't be fixed by the caller. Attached is a trivial patch to sort the results of os.listdir() to ensure the output tarfile is stable. This only applies to the 'tar' format. I've attached my testcase for this, which creates 3 tarfiles in /tmp. When this patch is applied, the 3 tarfiles it creates are identical according to `sha1sum`. Without this patch, they are all different. |
|||
| msg245465 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2015年06月18日 12:35 | |
This would go beyond what the tar command itself does. I'm not sure we want to do that, as we are pretty much modeling our behavior on tar. However, that doesn't automatically mean we can't do it. We'll see what other people think. Personally I'm -0. I've changed the issue title since your proposed patch is to tarfile, not shutil. |
|||
| msg245466 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月18日 13:04 | |
You don't need to patch the tarfile module. You could use os.walk() in shutil._make_tarball() and add each file with TarFile.add(recursive=False). |
|||
| msg245467 - (view) | Author: Sam Thursfield (samthursfield) * | Date: 2015年06月18日 14:25 | |
Thanks for the comments! Would you be happy for the patch to be merged if it was implemented by modifying shutil.make_archive() instead? I will rework it if so. |
|||
| msg245469 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年06月18日 15:13 | |
I don't see any downside for this simple patch and think there is some merit for wanting a reproducible archive. |
|||
| msg245493 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月19日 07:15 | |
The patch would change behaviour for all tarfile users by the back door, that's why I am a little reluctant. And if the same can be achieved by a reasonably simple change to shutil I think it's just as well. |
|||
| msg245497 - (view) | Author: Sam Thursfield (samthursfield) * | Date: 2015年06月19日 10:21 | |
I've discovered that this patch introduces a nasty failure case! If you have a relative symlink pointing to a directory that's alphabetically sorted after the symlink, and files inside the symlink, 'tar -x' won't be able to create those files because the symlink target won't exist yet. I'll rework this to only affect shutil.make_archive(), and to avoid hitting this bug. |
|||
| msg245498 - (view) | Author: Sam Thursfield (samthursfield) * | Date: 2015年06月19日 10:24 | |
Having tested, the problem I described above doesn't happen with this patch. It's a mistake in some other code I wrote which is following symlinks when it should not do. |
|||
| msg245628 - (view) | Author: Sam Thursfield (samthursfield) * | Date: 2015年06月22日 13:19 | |
Here's a patch which does the same thing but only for shutil.make_archive(). Note that the final output will still be non-deterministic if you use format=gztar because time.time() and the base_name argument get added to the gzip header. Might be nice to add an option to make that deterministic too, as a separate thing. This patch is useful to me as-is though. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:18 | admin | set | github: 68653 |
| 2017年06月18日 22:09:51 | martin.panter | set | dependencies:
+ tarfile add uses random order title: Make tarfile have deterministic sorting -> Make shutil.make_archive have deterministic sorting |
| 2015年06月22日 13:19:37 | samthursfield | set | files:
+ make_archive-stable-ordering.patch messages: + msg245628 |
| 2015年06月19日 10:24:50 | samthursfield | set | messages: + msg245498 |
| 2015年06月19日 10:21:52 | samthursfield | set | messages: + msg245497 |
| 2015年06月19日 07:15:23 | lars.gustaebel | set | messages: + msg245493 |
| 2015年06月18日 15:13:15 | rhettinger | set | nosy:
+ rhettinger messages: + msg245469 |
| 2015年06月18日 14:25:14 | samthursfield | set | messages: + msg245467 |
| 2015年06月18日 13:04:11 | lars.gustaebel | set | nosy:
+ lars.gustaebel messages: + msg245466 |
| 2015年06月18日 12:35:05 | r.david.murray | set | nosy:
+ r.david.murray title: Make tar files created by shutil.make_archive() have deterministic sorting -> Make tarfile have deterministic sorting messages: + msg245465 versions: + Python 3.6 |
| 2015年06月18日 12:21:45 | samthursfield | set | files:
+ tarfile-stable-ordering.patch keywords: + patch |
| 2015年06月18日 12:14:12 | samthursfield | create | |