Message 360268 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	wchargin
Recipients	wchargin
Date	2020年01月19日.20:23:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1579465438.04.0.91237843457.issue39389@roundup.psfhosted.org>

Content
The `gzip` module properly uses the user-specified compression level to control the underlying zlib stream compression level, but always writes metadata that indicates that the maximum compression level was used. Repro: ``` import gzip blob = b"The quick brown fox jumps over the lazy dog." * 32 with gzip.GzipFile("fast.gz", mode="wb", compresslevel=1) as outfile: outfile.write(blob) with gzip.GzipFile("best.gz", mode="wb", compresslevel=9) as outfile: outfile.write(blob) ``` Run this script, then run `wc -c .gz` and `file .gz`: ``` $ wc -c .gz 82 best.gz 84 fast.gz 166 total $ file .gz best.gz: gzip compressed data, was "best", last modified: Sun Jan 19 20:15:23 2020, max compression fast.gz: gzip compressed data, was "fast", last modified: Sun Jan 19 20:15:23 2020, max compression ``` The file sizes correctly reflect the difference, but `file` thinks that both archives are written at max compression. The error is that the ninth byte of the header in the output stream is hard-coded to `002円` at Lib/gzip.py:260 (as of 558f07891170), which indicates maximum compression. The correct value to indicate maximum speed is `004円`. See RFC 1952, section 2.3.1: <https://tools.ietf.org/html/rfc1952> Using GNU `gzip(1)` with `--fast` creates the same output file as the one emitted by the `gzip` module, except for two bytes: the metadata and the OS (the ninth and tenth bytes).

Content

The `gzip` module properly uses the user-specified compression level to
control the underlying zlib stream compression level, but always writes
metadata that indicates that the maximum compression level was used.
Repro:
```
import gzip
blob = b"The quick brown fox jumps over the lazy dog." * 32
with gzip.GzipFile("fast.gz", mode="wb", compresslevel=1) as outfile:
 outfile.write(blob)
with gzip.GzipFile("best.gz", mode="wb", compresslevel=9) as outfile:
 outfile.write(blob)
```
Run this script, then run `wc -c *.gz` and `file *.gz`:
```
$ wc -c *.gz
 82 best.gz
 84 fast.gz
166 total
$ file *.gz
best.gz: gzip compressed data, was "best", last modified: Sun Jan 19 20:15:23 2020, max compression
fast.gz: gzip compressed data, was "fast", last modified: Sun Jan 19 20:15:23 2020, max compression
```
The file sizes correctly reflect the difference, but `file` thinks that
both archives are written at max compression.
The error is that the ninth byte of the header in the output stream is
hard-coded to `002円` at Lib/gzip.py:260 (as of 558f07891170), which
indicates maximum compression. The correct value to indicate maximum
speed is `004円`. See RFC 1952, section 2.3.1:
<https://tools.ietf.org/html/rfc1952>
Using GNU `gzip(1)` with `--fast` creates the same output file as the
one emitted by the `gzip` module, except for two bytes: the metadata and
the OS (the ninth and tenth bytes).

History
Date	User	Action	Args
2020年01月19日 20:23:58	wchargin	set	recipients: + wchargin
2020年01月19日 20:23:58	wchargin	set	messageid: <1579465438.04.0.91237843457.issue39389@roundup.psfhosted.org>
2020年01月19日 20:23:57	wchargin	link	issue39389 messages
2020年01月19日 20:23:57	wchargin	create

homepage