homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile add uses random order
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bmwiedemann, christian.heimes, lars.gustaebel, ned.deily, python-dev, r.david.murray, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017年06月18日 02:03 by bmwiedemann, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2263 merged bmwiedemann, 2017年06月18日 02:37
PR 5557 merged bmwiedemann, 2018年02月05日 19:55
PR 5567 merged miss-islington, 2018年02月06日 18:10
PR 31713 open python-dev, 2022年03月06日 22:48
Messages (23)
msg296251 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017年06月18日 02:03
Filesystems do not give any guarantees about ordering of files returned in directory listings, thus tarfile.add adds files in random order, when using os.listdir in recursion.
See also https://reproducible-builds.org/docs/stable-inputs/ on that topic.
msg296280 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年06月18日 17:21
The patch for similar issue with the glob module was rejected recently since it is easy to sort the result of glob.glob() (see issue30461). This issue looks similar, but there are differences. On one side, the command line tar utility doesn't have the option for sorting file names and seems don't sort them by default (I didn't checked). It is possible to use external sorting with the tarfile module as with the tar utility (generate the list of all files and directories, sort it, and pass every item to TarFile.add with the option recursive=False). But on other side, this is not so easy as for glob.glob(). And the overhead of the sorting is expected to be smaller than for glob.glob(). This may be considered as additional arguments for approving the patch.
If this approach will be approved, it should be applied also to the ZIP archives.
FYI the order of archived files can affect the compression ratio of the compressed tar archive. For example the 7-Zip archiver sorts files by extensions, this increases the chance that files of the same type (text, multimedia, spreadsheet, executables, etc) are grouped together and use the common dictionary for global compression. This isn't directly related to this issue, just a material for possible future enhancement.
msg296308 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017年06月19日 09:20
note: recent GNU tar versions (1.28?) added an option --sort=name
and the overhead of sorting (e.g. I measured 4ms for 10000 files) is negligible compared to the other processing done on the files here.
msg310164 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018年01月17日 13:55
Given the reproducible builds angle, I'd say this was worth doing.
msg310166 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018年01月17日 14:07
+1 from me
In my opinion it's both a good idea to not sort the result of glob.glob() and make the order of tar and zip module content ordered. The glob module is low level and it makes sense to expose the file system sort order.
On the other hand tar and zip modules are on a higher level. Without sorting it's impossible to create reproducible archives. The performance impact is irrelevant. I/O and compression dominant performance.
msg310167 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018年01月17日 14:08
PS: I'm -0 to backport the change to 3.6 and 2.7. 3.5 is in security fix mode and therefore completely out of scope.
msg310169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年01月17日 14:11
Since we currently don't warranty *anything* about ordering, I like the idea of *fixing* Python 2.7 and 3.6 as well. +1 for fix it in 2.7, 3.6 and master.
msg310170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018年01月17日 14:14
Ah, I was just going to ask about that. I guess I'm -0 on the backport as well. The other reproducible build stuff is only going to land in 3.7. However, this is in a more general category than the pyc stuff, so I can see the argument for backporting it.
msg310171 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年01月17日 14:14
The only warranty in that TarFile.getmembers(), TarFile.getnames() and ZipFile.infolist() returns members/names "in the same order as the members in the archive".
Currently, there is no warranty when packing, only on unpack.
msg310173 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018年01月17日 14:20
The patch changes behavior. It's fine for 3.7 but not for 3.6/2.7. Somebody may depend on filesystem order.
msg310205 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018年01月17日 20:29
This doesn't seem appropriate to me for backporting to existing releases (3.6. and 2.7). AFAIK, the current file-system-order behavior has never been identified as a bug. Unless there is a stronger case for changing the existing 3.6.x behavior, I am -1 on backporting.
msg310337 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年01月20日 11:32
If make this change you need to make similar changes in other places that recursively add files to archives: shutil, zipapp, distutils, and maybe more.
msg310342 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年01月20日 13:17
I now agree to leave Python 2.7 and 3.6 unchanged.
msg311318 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018年01月31日 09:09
We missed beta freeze deadline. :/
Ned,
can we get this change into beta 2? It's low risk change to make the tarballs and other archives have a stable sort order. We even considered to backport the change to 3.6 and 2.7.
msg311320 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年01月31日 10:17
New changeset 84521047e413d7d1150aaa1c333580b683b3f4b1 by Victor Stinner (Bernhard M. Wiedemann) in branch 'master':
bpo-30693: zip+tarfile: sort directory listing (#2263)
https://github.com/python/cpython/commit/84521047e413d7d1150aaa1c333580b683b3f4b1
msg311321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年01月31日 10:22
> We missed beta freeze deadline. :/
I merged the PR. We will have to create a cherry-pick request once the 3.7 branch will be created. If Ned rejects it, we have to change the version number of documentation.
https://mail.python.org/pipermail/python-dev/2018-January/152012.html
IMHO the change is very safe to be merged into 3.7b2.
msg311324 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年01月31日 11:51
I requested additional changes in msg310337.
msg311325 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2018年01月31日 12:12
@Serhiy IMHO, just because we fix one problem, we do not have to fix all other problems at the same time. You can still open a pull-request for the others, but I know too little about those to test them.
And having commits pending for 7 months is not exactly energizing either.
For my use-case I just needed a trivial 1 line fix in tarfile.py and already ended up with a diffstat of
 7 files changed, 39 insertions(+), 6 deletions(-)
msg311387 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018年01月31日 23:33
New changeset 57750be4ad3fa2cfd3473b5be1f1e1a5d0fa9f50 by Ned Deily (Bernhard M. Wiedemann) in branch '3.7':
bpo-30693: zip+tarfile: sort directory listing (#2263)
https://github.com/python/cpython/commit/57750be4ad3fa2cfd3473b5be1f1e1a5d0fa9f50
msg311605 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年02月04日 16:06
Tests are failing on Windows.
======================================================================
ERROR: test_ordered_recursion (test.test_tarfile.Bz2WriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\unittest\mock.py", line 1191, in patched
 return func(*args, **keywargs)
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1152, in test_ordered_recursion
 support.unlink(os.path.join(path, "1"))
 File "C:\py\cpython3.7\lib\test\support\__init__.py", line 394, in unlink
 _unlink(filename)
 File "C:\py\cpython3.7\lib\test\support\__init__.py", line 344, in _unlink
 _waitfor(os.unlink, filename)
 File "C:\py\cpython3.7\lib\test\support\__init__.py", line 341, in _waitfor
 RuntimeWarning, stacklevel=4)
RuntimeWarning: tests may fail, delete still pending for C:\py\cpython3.7\build\test_python_8504\@test_8504_tmp-tardir\directory1円
======================================================================
ERROR: test_directory_size (test.test_tarfile.GzipWriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1121, in test_directory_size
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
======================================================================
ERROR: test_ordered_recursion (test.test_tarfile.GzipWriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\unittest\mock.py", line 1191, in patched
 return func(*args, **keywargs)
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1137, in test_ordered_recursion
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
======================================================================
ERROR: test_directory_size (test.test_tarfile.LzmaWriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1121, in test_directory_size
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
======================================================================
ERROR: test_ordered_recursion (test.test_tarfile.LzmaWriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\unittest\mock.py", line 1191, in patched
 return func(*args, **keywargs)
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1137, in test_ordered_recursion
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
======================================================================
ERROR: test_directory_size (test.test_tarfile.WriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1121, in test_directory_size
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
======================================================================
ERROR: test_ordered_recursion (test.test_tarfile.WriteTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\py\cpython3.7\lib\unittest\mock.py", line 1191, in patched
 return func(*args, **keywargs)
 File "C:\py\cpython3.7\lib\test\test_tarfile.py", line 1137, in test_ordered_recursion
 os.mkdir(path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\py\\cpython3.7\\build\\test_python_8504\\@test_8504_tmp-tardir\\directory'
----------------------------------------------------------------------
msg311685 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2018年02月05日 19:57
Serhiy, can you test https://github.com/python/cpython/pull/5557 
msg311736 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年02月06日 18:08
New changeset 4ad703b7ca463d1183539277dde90ffb1c808487 by Serhiy Storchaka (Bernhard M. Wiedemann) in branch 'master':
bpo-30693: Fix tarfile test cleanup on MSWindows (#5557)
https://github.com/python/cpython/commit/4ad703b7ca463d1183539277dde90ffb1c808487
msg311738 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年02月06日 18:33
New changeset 2c6f6682768f401c297c584ef106d48c78697f67 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.7':
bpo-30693: Fix tarfile test cleanup on MSWindows (GH-5557) (GH-5567)
https://github.com/python/cpython/commit/2c6f6682768f401c297c584ef106d48c78697f67
History
Date User Action Args
2022年04月11日 14:58:47adminsetgithub: 74878
2022年03月06日 22:48:55python-devsetnosy: + python-dev

pull_requests: + pull_request29831
2018年02月06日 18:38:13serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018年02月06日 18:33:30serhiy.storchakasetmessages: + msg311738
2018年02月06日 18:10:04miss-islingtonsetpull_requests: + pull_request5388
2018年02月06日 18:08:58serhiy.storchakasetmessages: + msg311736
2018年02月05日 19:57:16bmwiedemannsetmessages: + msg311685
2018年02月05日 19:55:35bmwiedemannsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request5379
2018年02月04日 16:06:25serhiy.storchakasetmessages: + msg311605
stage: patch review -> needs patch
2018年01月31日 23:33:09ned.deilysetmessages: + msg311387
2018年01月31日 12:12:54bmwiedemannsetmessages: + msg311325
2018年01月31日 11:51:34serhiy.storchakasetmessages: + msg311324
2018年01月31日 10:22:07vstinnersetmessages: + msg311321
2018年01月31日 10:17:16vstinnersetmessages: + msg311320
2018年01月31日 09:09:18christian.heimessetmessages: + msg311318
2018年01月20日 13:17:40vstinnersetmessages: + msg310342
2018年01月20日 11:32:41serhiy.storchakasetmessages: + msg310337
versions: - Python 2.7, Python 3.6
2018年01月17日 20:29:51ned.deilysetnosy: + ned.deily
messages: + msg310205
2018年01月17日 14:20:23christian.heimessetmessages: + msg310173
versions: - Python 3.5
2018年01月17日 14:14:14vstinnersetnosy: + vstinner
messages: + msg310171
2018年01月17日 14:14:01r.david.murraysetnosy: - vstinner

messages: + msg310170
versions: + Python 3.5
2018年01月17日 14:11:55vstinnersetnosy: + vstinner
messages: + msg310169
2018年01月17日 14:08:30christian.heimessetmessages: + msg310167
versions: - Python 3.5
2018年01月17日 14:07:20christian.heimessetnosy: + christian.heimes
messages: + msg310166
2018年01月17日 13:55:20r.david.murraysetnosy: + r.david.murray
messages: + msg310164
2017年06月19日 09:20:33bmwiedemannsetmessages: + msg296308
2017年06月18日 22:09:51martin.panterlinkissue24465 dependencies
2017年06月18日 17:21:59serhiy.storchakasetversions: - Python 3.3, Python 3.4
nosy: + rhettinger, lars.gustaebel, serhiy.storchaka

messages: + msg296280

stage: patch review
2017年06月18日 02:37:18bmwiedemannsetpull_requests: + pull_request2313
2017年06月18日 02:03:30bmwiedemanncreate

AltStyle によって変換されたページ (->オリジナル) /