homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile module should have a command line
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Ankur.Ankan, berker.peksag, brandon-rhodes, eric.araujo, ezio.melotti, kyle, larry, lars.gustaebel, pitrou, python-dev, rhettinger, serhiy.storchaka, vstinner
Priority: low Keywords: needs review, patch

Created on 2011年11月25日 03:11 by brandon-rhodes, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue_13477 Ankur.Ankan, 2013年03月03日 04:25 review
issue_13477_v2 Ankur.Ankan, 2013年03月07日 13:17 review
issue13477_v3.diff berker.peksag, 2013年03月08日 03:20 review
issue13477_v4.diff berker.peksag, 2013年04月06日 14:36 review
tarcli.patch pitrou, 2013年08月15日 20:53 review
issue13477_v5.diff berker.peksag, 2013年09月26日 18:41 review
issue13477_v6.diff berker.peksag, 2013年11月20日 17:10 review
tarfile_cli.patch serhiy.storchaka, 2013年11月23日 12:54 review
Messages (36)
msg148300 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2011年11月25日 03:11
The tarfile module should have a simple command line that allows it to be executed with "-m" — even if its only ability was to take a filename and extract it to the current directory, it could be a lifesaver on Windows machines where Python has been installed but nothing else. Would such a patch be welcome if I could write one up?
msg148301 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年11月25日 03:18
The feature request seems reasonable to me, but this can only go in 3.3.
If you want to propose a patch, you might want to check the devguide and what other modules like zipfile do.
msg148389 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年11月26日 10:37
This is no bad idea. I recommend keeping it as simple as possible. I would definitely not be supportive of a full tar clone. List, extract, create - that should be enough. There are two possible command line choices: do what the zipfile module does or emulate tar. I am in favor of the latter.
msg183354 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年03月03日 02:21
Patch looks good! Some minor comments on Rietveld.
Could you add tests?
msg183356 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年03月03日 02:28
+1 for adding a CLI and +1 for keeping it minimal.
msg183363 - (view) Author: Ankur Ankan (Ankur.Ankan) * Date: 2013年03月03日 04:25
I was also working on this issue so thought I should also submit my patch.
Has a few extra features from berker.peksag's patch:
1) the name of the files to be extracted can be specified
2) output directory can be specified for extracting files.
msg183507 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2013年03月05日 00:50
> Patch looks good! Some minor comments on Rietveld.
Thanks for the review, Éric.
> Could you add tests?
Done.
Here's the new patch with Éric's comments addressed.
msg183656 - (view) Author: Ankur Ankan (Ankur.Ankan) * Date: 2013年03月07日 13:17
Thanks for your comments Serhiy.
I have improved the patch according to your comments. Please have a look.
And I am writing tests.
msg183677 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年03月07日 16:29
It will be good if Berker and Ankur will merge their patches. Ankur's patch has some very useful features, but Berker's patch looks more mature.
I prefer to emulate a subset of the tar utility interface too.
msg183682 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年03月07日 16:40
I am more in favor of having something simple and similar to zipfile, like Lars, rather than following tar.
msg183684 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年03月07日 16:48
This can confuse users. Note that even jar (which works with zip-like files) honors tar interface.
msg183688 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年03月07日 17:12
Yeah, that’s always the discussion when writing a Python utility that has a unix equivalent: do you want to be familiar to Python users or to the unix tool users?
I don’t have a strong opinion. I think unix users would have no reason to use python -m tarfile, and windows users won’t have the expectation that the interface is the same as tar—unless they are unix people who are using a windows machine for whatever reason. If it were me, I’d just start with python -m tarfile --help, so I’d have no expectations :)
msg183722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年03月08日 02:26
+ parser.add_argument('--gz', '--gunzip', '--gzip', '--tgz', '-z', 
+ '--ungzip', action = 'store_true', 
+ help = 'gz compression')
+ parser.add_argument('--bz2', '--bzip2', '--tbz2', '--tbz', '--tb2',
+ action = 'store_true', help = 'bz2 compression')
+ parser.add_argument('--xz', '--lzma', action = 'store_true',
+ help = 'xz compression')
Do we really need so much names for the same option? Where do these names come from?
--
main() should exit after extract and create to only do one operation and don't always display the usage.
It would be better to not duplicate the list of options and use parser.print_help() instead of sys.stdout.write(__doc__).
Some consistency tests on exclusive options (bzip/gzip/lzma and list/create/extract) would be nice.
--
tar options on Linux:
 -c, --create
 -t, --list
 -x, --extract, --get
 -z, --gzip, --ungzip
 -j, -I, --bzip
 -C, --directory DIRECTORY
For tarfile, I propose to have a shorter list, and try to stay somehow compatible with tar:
 -c, --create
 -t, --list
 -x, --extract
 -z, --gzip
 -j, --bzip
 -C, --directory DIRECTORY
Users of the TAR format usually come from UNIX, so using the same command line options should not be so surprising.
I don't like the idea of an optional argument for --extract: "--extract file1 file2" is usually understood/read as "--extract=filename archive.tar". If you really think that we need to support "only extract some files", it should be a different option. Linux tar command has no such option. I propose to drop this feature (always extract all files).
msg183724 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2013年03月08日 03:19
New patch(issue13477_v3.diff) attached.
Changes:
* Addressed comments from Serhiy
* Added "output" parameter to --extract option (from Ankur's patch)
* Updated tests and documentation
The current docstring of tarfile module does not give much
information(it just prints "Read from and write to tar format
archives.") so I skipped the -d option.
msg183725 - (view) Author: Ankur Ankan (Ankur.Ankan) * Date: 2013年03月08日 04:12
> + parser.add_argument('--gz', '--gunzip', '--gzip', '--tgz', '-z', 
> + '--ungzip', action = 'store_true', 
> + help = 'gz compression')
> + parser.add_argument('--bz2', '--bzip2', '--tbz2', '--tbz', '-- tb2',
> + action = 'store_true', help = 'bz2 compression')
> + parser.add_argument('--xz', '--lzma', action = 'store_true',
> + help = 'xz compression')
> Do we really need so much names for the same option? Where do these > names come from?
 I was trying to implement all the formats mentioned in Serhiy's review. (and also different names for the same format)
msg183739 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年03月08日 15:04
> Users of the TAR format usually come from UNIX,
> so using the same command line options should not be so surprising.
Not sure about that: they could be Python users wanting to unpack a tarball sdist. That said, there is no harm in being compatible, and I like your small list of options.
FTR Lars said that he prefered compat with the zipfile CLI, which is:
Usage:
 zipfile.py -l zipfile.zip # Show listing of a zipfile
 zipfile.py -t zipfile.zip # Test if a zipfile is valid
 zipfile.py -e zipfile.zip target # Extract zipfile into target dir
 zipfile.py -c zipfile.zip src ... # Create zipfile from sources
msg183749 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年03月08日 16:45
Did you get all the review comments? Some of them were made on older versions of the patch, and don’t seem to be addressed in the latest version. Thanks.
Ankur, could you submit a contributor agreement? http://www.python.org/psf/contrib/contrib-form/ 
msg183752 - (view) Author: Ankur Ankan (Ankur.Ankan) * Date: 2013年03月08日 17:11
I am still unclear about the outcomes of the discussion. I am confused which features need to be kept and which are to be removed.
> Ankur, could you submit a contributor agreement? 
I will submit it today.
msg184626 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013年03月19日 10:06
Modern tar programs don't need to be told the compression method--they infer it. If they can do it in C, we can do it in Python. So we should simply omit the "-bz2" stuff.
As for what the interface should look like, I'm definitely in favor of it looking like tar. unzip has the same interface on different platforms; so does 7zip, so does unrar. I think it's reasonable to expect that tar would take the same interface on different platforms. We don't need to coddle Windows users here. We're already expecting them to be sophisticated enough to handle the EOL conversion we're not doing for them.
msg184628 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年03月19日 10:28
Note that --create command should support --directory option too.
> Modern tar programs don't need to be told the compression method--they infer it. If they can do it in C, we can do it in Python. So we should simply omit the "-bz2" stuff.
An archive may have no extension or have a nonstandard extension. And stdin/stdout does not have a name.
msg184644 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013年03月19日 17:10
Huh. tar *can* infer it from the data itself. On the other hand, it chooses explicitly not to.
% cat ~/Downloads/Python-3.3.0.tar.bz2| tar xvf -
tar: Archive is compressed. Use -j option
tar: Error is not recoverable: exiting now
% cat ~/Downloads/Python-3.3.0.tgz| tar xvf -
tar: Archive is compressed. Use -z option
tar: Error is not recoverable: exiting now
I guess "tar" knows explicit is better than implicit too ;-)
msg184729 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2013年03月20日 03:04
Larry Hastings <report@bugs.python.org> writes:
> Huh. tar *can* infer it from the data itself. On the other hand, it
> chooses explicitly not to. I guess "tar" knows explicit is better
> than implicit too ;-)
I am told that the refusal of "tar" to introspect the data is because:
(a) Tar runs "gunzip -c" (for example) as an external program; it does
not actually compile against libz.
(b) Streams in UNIX cannot be rewound. Tar cannot look at the first
block of an input pipe and then "put the block back" so that the same
input can be fed directly to "gunzip" as its input.
(c) Given (a) and (b), tar could only support data introspection of
input from a pipe if it were willing to be a pass-through that, after
reading and introspecting the first block, then fired up "gunzip" and
sent ALL of the blocks through. Which would require multiprocessing,
threading, or async I/O so that tar could both read and write, which
would make tar more complicated.
(d) Therefore, tar refuses to even look.
Since Python does bundle compression in its standard library, it can
quite trivially step forward and actually do the data introspection that
tar insists on not doing; the first few bytes of a tar archive are quite
demonstrably different from the first bytes of a gzip stream, if I
recall.
msg184753 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年03月20日 11:27
I don't think that we need to support compressing/decompressing using
the standard input/output.
2013年3月20日 Brandon Craig Rhodes <report@bugs.python.org>:
>
> Brandon Craig Rhodes added the comment:
>
> Larry Hastings <report@bugs.python.org> writes:
>
>> Huh. tar *can* infer it from the data itself. On the other hand, it
>> chooses explicitly not to. I guess "tar" knows explicit is better
>> than implicit too ;-)
>
> I am told that the refusal of "tar" to introspect the data is because:
>
> (a) Tar runs "gunzip -c" (for example) as an external program; it does
> not actually compile against libz.
>
> (b) Streams in UNIX cannot be rewound. Tar cannot look at the first
> block of an input pipe and then "put the block back" so that the same
> input can be fed directly to "gunzip" as its input.
>
> (c) Given (a) and (b), tar could only support data introspection of
> input from a pipe if it were willing to be a pass-through that, after
> reading and introspecting the first block, then fired up "gunzip" and
> sent ALL of the blocks through. Which would require multiprocessing,
> threading, or async I/O so that tar could both read and write, which
> would make tar more complicated.
>
> (d) Therefore, tar refuses to even look.
>
> Since Python does bundle compression in its standard library, it can
> quite trivially step forward and actually do the data introspection that
> tar insists on not doing; the first few bytes of a tar archive are quite
> demonstrably different from the first bytes of a gzip stream, if I
> recall.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue13477>
> _______________________________________
msg184758 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2013年03月20日 12:50
I'd like to re-emphasize that it is best to keep the whole thing as simple and straight-forward as possible. Offer some basic operations and that's it.
Although I am pretty accustomed to the original tar command line, I think we should copy zipfile's interface. It makes more sense to offer some kind of unified "Python" command line approach for archive access than keeping to old traditions.
I agree with Victor that we don't really need support for stdin/stdout. It only complicates matters. 
If everybody still votes for stdin/stdout, I'd like to point out that tarfile supports compression detection for streams. It would be best to use mode="r|*" throughout because it works for both normal files and stdin. Use mode="w|(compression)" for writing to files and stdout accordingly.
If we do not support stdin/stdout we no longer need all these compression options because for reading we do autodetection and for writing we could deduce the compression from the file extension (which is just some kind of autodetection too).
Another side note: We should be aware of the effects discussed in issue17102 and issue1044. In my opinion tarfile as a library is obligated to behave like that, but maybe that's not acceptable for a command line tool.
msg186212 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年04月07日 15:05
Then I propose to add an alternative tarfile command-line interface as Tools/scripts/tar.py for those who prefer a well-known and well-tested traditional interface.
msg195287 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年08月15日 20:53
Regenerated patch against latest default (fixing conflicts).
msg198450 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2013年09月26日 18:41
Thanks for the rebase, Antoine.
Here is an updated patch:
- Adressed Serhiy's comments. I didn't add a directory parameter to the
 create command to keep the CLI simple.
- Added a test for dotless files
- Returned proper exit codes
msg199496 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年10月11日 18:32
From a quick glance, the patch looks ok. Serhiy, do you want to review it any further?
msg199501 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年10月11日 20:01
Yes, this is in my plans.
msg203062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年11月16日 16:37
I have added comments on Rietveld.
msg203510 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2013年11月20日 17:10
Attached an updated patch that addresses Serhiy's comments. Thanks!
msg203993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年11月23日 12:54
I think Berker has misunderstood me. Here is a patch based on issue13477_v5.diff with some cherry-picked changes from issue13477_v6.diff and several other changes:
* --create, --extract, --list, and --test options are now mutual exclusive.
* --test now test a tarfile for integrity (as in the zipfile module).
* File names in output are printed now with repr().
* Now tarfile CLI now is silent by default. Added option -v (--verbose) to print more verbose output as in issue13477_v5.diff.
* Added helps for arguments.
* Fixed and enhanced tests,
I'm going to commit this patch at short time.
Known bugs:
* Help for --extract shows "--extract <tarfile> [<output_dir> ...]" instead of "--extract <tarfile> [<output_dir>]". --extract accepts only 1 to 2 arguments.
* --list fails with a tarfile containing unencodable file names. In particular it fails with test tarfiles in the test suite.
* Possible problems with unusual locales and file system encodings.
* Corrupted tarfiles produces tracebacks.
* Tests for --create should check that created tarfile contains correct files.
* Tests for --create should check that correct files are extracted.
* Needed tests for non-ASCII file names.
Besides all this I think the patch can be committed.
msg204134 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年11月23日 23:54
New changeset a5b6c8cbc473 by Serhiy Storchaka in branch 'default':
Issue #13477: Added command line interface to the tarfile module.
http://hg.python.org/cpython/rev/a5b6c8cbc473 
msg204136 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年11月24日 00:32
New changeset 70b9d22b900a by Serhiy Storchaka in branch 'default':
Build a list of supported test tarfiles dynamically for CLI "test" command
http://hg.python.org/cpython/rev/70b9d22b900a 
msg204233 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年11月24日 16:32
> changeset: 87476:a539c85aec51
> user: Antoine Pitrou <solipsis@pitrou.net>
> date: Sun Nov 24 01:55:05 2013 +0100
> summary:
> Try to fix test_tarfile under Windows
Thank you Antoine.
msg213007 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年03月10日 01:35
New changeset 5b52db6fc7dc by R David Murray in branch 'default':
whatsnew: tarfile cli (#13477).
http://hg.python.org/cpython/rev/5b52db6fc7dc 
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57686
2014年03月10日 23:54:20pitrousetstatus: open -> closed
assignee: lars.gustaebel ->
resolution: fixed
stage: commit review -> resolved
2014年03月10日 01:35:03python-devsetmessages: + msg213007
2013年11月24日 16:32:12serhiy.storchakasetmessages: + msg204233
2013年11月24日 00:32:46python-devsetmessages: + msg204136
2013年11月23日 23:54:31python-devsetnosy: + python-dev
messages: + msg204134
2013年11月23日 12:54:40serhiy.storchakasetfiles: + tarfile_cli.patch

messages: + msg203993
stage: patch review -> commit review
2013年11月20日 17:10:09berker.peksagsetfiles: + issue13477_v6.diff

messages: + msg203510
2013年11月16日 16:37:07serhiy.storchakasetmessages: + msg203062
2013年10月11日 20:01:11serhiy.storchakasetmessages: + msg199501
2013年10月11日 18:32:31pitrousetmessages: + msg199496
2013年10月11日 14:27:26berker.peksagsetkeywords: + needs review
2013年09月26日 18:41:52berker.peksagsetfiles: + issue13477_v5.diff

messages: + msg198450
components: + Library (Lib)
2013年08月15日 20:53:29pitrousetfiles: + tarcli.patch

nosy: + pitrou
messages: + msg195287

stage: needs patch -> patch review
2013年04月07日 15:05:00serhiy.storchakasetmessages: + msg186212
2013年04月06日 14:36:50berker.peksagsetfiles: - issue13477_v2.diff
2013年04月06日 14:36:35berker.peksagsetfiles: + issue13477_v4.diff
2013年03月20日 12:50:43lars.gustaebelsetmessages: + msg184758
2013年03月20日 11:27:10vstinnersetmessages: + msg184753
2013年03月20日 03:04:10brandon-rhodessetmessages: + msg184729
2013年03月19日 17:10:14larrysetmessages: + msg184644
2013年03月19日 10:28:37serhiy.storchakasetmessages: + msg184628
2013年03月19日 10:06:54larrysetnosy: + larry
messages: + msg184626
2013年03月08日 17:11:24Ankur.Ankansetmessages: + msg183752
2013年03月08日 16:45:55eric.araujosetmessages: + msg183749
2013年03月08日 15:04:04eric.araujosetmessages: + msg183739
2013年03月08日 04:12:48Ankur.Ankansetmessages: + msg183725
2013年03月08日 03:20:55berker.peksagsetfiles: + issue13477_v3.diff
2013年03月08日 03:20:30berker.peksagsetfiles: - issue13477_v3.diff
2013年03月08日 03:19:52berker.peksagsetfiles: - issue13477.diff
2013年03月08日 03:19:40berker.peksagsetfiles: + issue13477_v3.diff

messages: + msg183724
2013年03月08日 02:26:56vstinnersetnosy: + vstinner
messages: + msg183722
2013年03月07日 17:12:13eric.araujosetmessages: + msg183688
2013年03月07日 16:48:26serhiy.storchakasetmessages: + msg183684
2013年03月07日 16:40:53eric.araujosetmessages: + msg183682
2013年03月07日 16:29:21serhiy.storchakasetmessages: + msg183677
2013年03月07日 13:17:27Ankur.Ankansetfiles: + issue_13477_v2

messages: + msg183656
2013年03月05日 00:50:38berker.peksagsetfiles: + issue13477_v2.diff

messages: + msg183507
2013年03月04日 16:36:20serhiy.storchakasetnosy: + serhiy.storchaka
2013年03月03日 04:25:19Ankur.Ankansetfiles: + issue_13477

messages: + msg183363
2013年03月03日 02:28:51rhettingersetnosy: + rhettinger
messages: + msg183356
2013年03月03日 02:21:01eric.araujosetmessages: + msg183354
2013年03月02日 22:17:04berker.peksagsetfiles: + issue13477.diff
nosy: + berker.peksag
keywords: + patch
2013年02月27日 16:48:45Ankur.Ankansetnosy: + Ankur.Ankan
2012年10月08日 22:28:59berker.peksagsetversions: + Python 3.4, - Python 3.3
2012年10月04日 16:39:29kylesetnosy: + kyle
2011年11月26日 13:16:27eric.araujosetnosy: + eric.araujo
2011年11月26日 10:37:48lars.gustaebelsetpriority: normal -> low
assignee: lars.gustaebel
messages: + msg148389

stage: test needed -> needs patch
2011年11月25日 03:18:28ezio.melottisetversions: + Python 3.3
nosy: + ezio.melotti, lars.gustaebel

messages: + msg148301

type: enhancement
stage: test needed
2011年11月25日 03:11:05brandon-rhodescreate

AltStyle によって変換されたページ (->オリジナル) /