Issue 13881: Stream encoder for zlib_codec doesn't use the incremental encoder

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58089

classification

Title:	Stream encoder for zlib_codec doesn't use the incremental encoder
Type:	behavior	Stage:	patch review
Components:	IO	Versions:	Python 3.3, Python 3.5

process

Status:	open	Resolution:
Dependencies:	23231	Superseder:
Assigned To:	Nosy List:	amcnabb, doerwalter, jcea, lemburg, loewis, martin.panter, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年01月26日 19:40 by amcnabb, last changed 2022年04月11日 14:57 by admin.

Files
File name	Uploaded	Description	Edit
zlib-bz2-writer.patch	martin.panter, 2015年01月14日 06:14	Depends on Issue 23231	review
zlib-bz2-writer.v2.patch	martin.panter, 2015年01月15日 04:43	Depends on Issue 23231	review
zlib-bz2-writer.v3.patch	martin.panter, 2015年01月15日 23:02	Depends on Issue 23231	review
zlib-bz2-writer.v4.patch	martin.panter, 2015年01月25日 05:14	Depends on Issue 23231

Messages (13)
msg152029 - (view)	Author: Andrew McNabb (amcnabb)	Date: 2012年01月26日 19:40
The stream encoder for the zlib_codec doesn't use the incremental encoder, so it has limited usefulness in practice. This is easiest to show with an example. Here is the behavior with the stream encoder: >>> filelike = io.BytesIO() >>> wrapped = codecs.getwriter('zlib_codec')(filelike) >>> wrapped.write(b'hello') >>> filelike.getvalue() b'x\x9c\xab\x00\x00\x00y\x00y' >>> wrapped.write(b'x') >>> filelike.getvalue() b'x\x9c\xab\x00\x00\x00y\x00yx\x9c\xab\x00\x00\x00y\x00y' >>> However, this is the behavior of the incremental encoder: >>> ienc = codecs.getincrementalencoder('zlib_codec')() >>> ienc.encode(b'x') b'x\x9c' >>> ienc.encode(b'x', final=True) b'\xab\xa8\x00\x00\x01j\x00\xf1' >>> The stream encoder is apparently encoding each write as an individual block, but the incremental encoder buffers until it gets a block that's large enough to be meaningfully compressed. Fixing this may require addressing a separate issue with stream encoders. Unlike with the GzipFile module, closing a stream encoder closes the underlying file. If this underlying file is a BytesIO, then closing makes it free its buffer, making it impossible to get at the completed file.
msg153142 - (view)	Author: Jesús Cea Avión (jcea) * (Python committer)	Date: 2012年02月11日 23:34
Andrew, could you possibly write a patch and a test for 3.3?
msg153358 - (view)	Author: Andrew McNabb (amcnabb)	Date: 2012年02月14日 18:43
It looks like encodings/zlib_codec.py defines a custom IncrementalEncoder and IncrementalDecoder, but its StreamWriter and StreamReader rely on the standard implementation of codecs.StreamWriter and codecs.StreamReader. One solution might be to have zlib_codec.StreamWriter inherit from zlib_codec.IncrementalEncoder instead of from zlib_encoder.Codec. I'm not familiar enough with the codecs library to know whether this is the best approach. Unfortunately, there are 120 codec files in the encodings directory, and it's unclear how many of them would need to be modified. Based on the number of them that implement StreamWriter as "class StreamWriter(Codec,codecs.StreamWriter)", it looks like it might be a lot of them. Was each of these 120 files hand-written?
msg153372 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年02月14日 21:48
See also issue #7475.
msg207853 - (view)	Author: Walter Dörwald (doerwalter) * (Python committer)	Date: 2014年01月10日 11:44
The stream part of the codecs isn't used that much in Python 3 any more, so I'm not sure if this is worth fixing.
msg226680 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2014年09月10日 05:23
The corresponding stream reader has a related issue which I mentioned in Issue 20132
msg226681 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2014年09月10日 05:29
Even if all these issues aren’t worth fixing, I think the documentation should at least say which codecs work fully (e.g. most text encodings), which ones work suboptimally (e.g. UTF-7), and which ones only work for single-shot encoding and decoding.
msg233983 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月14日 01:06
See Issue 23231 for a proposal which should make the incremental codec API compatible with a generic StreamReader/Writer class. I discovered that many of the codec files are generated by gencodec.py, not hand-written. However when I tried regenerating them, I found a handful had diverged from the generated code, but others were more or less true to the generated code.
msg234008 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月14日 06:14
Here is a patch to implement the zlib-codec and bz2-codec StreamWriter classes based on their IncrementalEncoder classes. It depends on my patch for Issue 23231, though I guess it could be tweaked to work around that if desired.
msg234048 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月15日 04:43
New patch that also fixes StreamWriter.writelines() in general for the byte codecs
msg234067 - (view)	Author: Marc-Andre Lemburg (lemburg) * (Python committer)	Date: 2015年01月15日 08:48
On 15.01.2015 05:43, Martin Panter wrote: > > New patch that also fixes StreamWriter.writelines() in general for the byte codecs Could you explain this new undocumented class ? +class _IncrementalBasedWriter(StreamWriter): + """Generic StreamWriter implementation. + + The _EncoderClass attribute must be set to an IncrementalEncoder + class to use. + """ + + def __init__(self, stream, errors='strict'): + super().__init__(stream, errors) + self._encoder = self._Encoder(errors) + + def write(self, object): + self.stream.write(self._encoder.encode(object)) + + def reset(self): + self.stream.write(self._encoder.encode(final=True)) + Note that the doc-string mentions a non-existing attribute and there are doc-string missing for the other methods. The purpose appears to be a StreamWriter which works with an IncrementalEncoder. A proper name would thus be IncrementalStreamWriter which provides an .encode() method which adapts the signature of the incremental encoder to the one expected for StreamWriters and Codecs.
msg234101 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月15日 23:02
Sorry, I changed the name of the attribute and forgot to update the doc string. Its new name was _Encoder. Your description was fairly accurate. I am adding patch v3, with an expanded the doc string. Hopefully that explains it a bit better. Since it is just implementing the documented StreamWriter API, I only added brief descriptions of the methods pointing back to the StreamWriter methods.
msg234653 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月25日 05:14
Here is patch v4. The stream writer is now automatically generated by default by the CodecInfo constructor if no custom stream writer parameter is supplied. The base64 and quopri codecs are adjusted to also use this default stream writer to help with Issue 20132.

History
Date	User	Action	Args
2022年04月11日 14:57:26	admin	set	github: 58089
2015年07月23日 01:54:38	martin.panter	link	issue20132 dependencies
2015年07月16日 02:08:14	martin.panter	set	dependencies: + Fix codecs.iterencode/decode() by allowing data parameter to be omitted stage: patch review
2015年01月25日 05:14:13	martin.panter	set	files: + zlib-bz2-writer.v4.patch messages: + msg234653
2015年01月15日 23:02:55	martin.panter	set	files: + zlib-bz2-writer.v3.patch messages: + msg234101
2015年01月15日 08:48:52	lemburg	set	messages: + msg234067
2015年01月15日 04:43:12	martin.panter	set	files: + zlib-bz2-writer.v2.patch messages: + msg234048
2015年01月14日 06:14:08	martin.panter	set	files: + zlib-bz2-writer.patch keywords: + patch messages: + msg234008 versions: + Python 3.5
2015年01月14日 01:06:21	martin.panter	set	messages: + msg233983
2014年09月10日 05:29:41	martin.panter	set	messages: + msg226681
2014年09月10日 05:23:20	martin.panter	set	messages: + msg226680
2014年01月10日 11:44:24	doerwalter	set	messages: + msg207853
2014年01月05日 15:17:23	serhiy.storchaka	set	nosy: + lemburg, loewis, doerwalter
2014年01月05日 13:31:14	martin.panter	set	nosy: + martin.panter
2012年02月14日 21:48:22	vstinner	set	messages: + msg153372
2012年02月14日 18:53:02	r.david.murray	set	nosy: + vstinner
2012年02月14日 18:43:08	amcnabb	set	messages: + msg153358
2012年02月11日 23:34:41	jcea	set	nosy: + jcea messages: + msg153142 versions: + Python 3.3, - Python 3.2
2012年01月26日 19:40:48	amcnabb	create

homepage