Issue 5804: Add an 'offset' argument to zlib.decompress

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50054

classification

Title:	Add an 'offset' argument to zlib.decompress
Type:	enhancement	Stage:
Components:	Extension Modules	Versions:	Python 3.5

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	Nosy List:	amaury.forgeotdarc, gregory.p.smith, kristjan.jonsson, martin.panter, nadeem.vawda, pitrou, r.david.murray, rbcollins
Priority:	normal	Keywords:	patch

Created on 2009年04月21日 09:51 by kristjan.jonsson, last changed 2022年04月11日 14:56 by admin.

Files
File name	Uploaded	Description	Edit
zlibpatch.patch	kristjan.jonsson, 2009年04月21日 09:51
zlibpatch.patch	kristjan.jonsson, 2009年04月21日 16:13
zlibpatch2.patch	kristjan.jonsson, 2009年04月23日 19:18

Messages (16)
msg86227 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年04月21日 09:51
It can be useful to get whatever remaining data there exists in the decompression stream for further processing. This is currently possible by creating a zlib.decompressobj and accessing its unused_data member, but that is quite cumbersome. The attached patch adds a 'tail' keyword argument to zlib.decompress which causes the result to be a 2-tuple, with the first part containing the decompressed data, and the second containing any remaining data
msg86240 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年04月21日 15:43
As usual the patch should come with some unit tests. Also, it seems you don't test the return value of PyString_FromStringAndSize before building your tuple.
msg86241 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年04月21日 15:46
Unittests coming up. It is not necessary to test the return value, it is done by Py_BuildValue(). I think it is common to do it this way, but if You think explicit testing is better, then that's no problem.
msg86244 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年04月21日 15:57
> It is not necessary to test the return value, it is done by > Py_BuildValue(). Sorry, it's ok then.
msg86247 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年04月21日 16:13
New patch, with explicit error test and tests in test_zlib.py, both for the "tail" parameter and the hitherto untested "unused_data" attribute. I'm not completely happy with the name of the argument, "tail", any suggestions? Also, since I made compress a "keywords" method, should I apply that to the rest of the functions?
msg86284 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年04月22日 10:26
Note, I thought of the "tail" argument to reflect the "unused_data" member of the decompressobj(). But a more useful way is actually this: result, offset = zlib.decompress(buffer, offset=0) if offset<len(buffer): result2, offset = zlib.decompress(buffer, offset=offset) This avoids data copying as much as possible. The presence of a non- default offset argument, would trigger the "tuple" return value. We could add the same argument to decompressobj.decompress, and add "unused_offset" member there for symmetry. Any thoughts?
msg86373 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年04月23日 19:18
I've added a new patch, which instead uses an "offset" parameter. This is more natural, perhaps, and also avoids data copying. Complete with tests and documentation update.
msg87550 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2009年05月10日 21:05
Overall, this looks good. Some mostly minor comments in this review. http://codereview.appspot.com/63060/diff/1/2 File Doc/library/zlib.rst (right): http://codereview.appspot.com/63060/diff/1/2#newcode136 Line 136: When specified, it will cause the function's return value to be a (uncompressed, offset) how about this wording: ..."to be a tuple of (uncompressed, offset), with the"... http://codereview.appspot.com/63060/diff/1/2#newcode142 Line 142: Added the offset argument missing . at the end of the sentence. Also, mention that this function now accepts keyword arguments. http://codereview.appspot.com/63060/diff/1/3 File Lib/test/test_zlib.py (right): http://codereview.appspot.com/63060/diff/1/3#newcode110 Line 110: c = zlib.compress(a)+zlib.compress(b) please surround the + with spaces for readability and consistency with other code in the file.\ http://codereview.appspot.com/63060/diff/1/4 File Modules/zlibmodule.c (right): http://codereview.appspot.com/63060/diff/1/4#newcode193 Line 193: "the start position in the string to start decompresion and, if spceified,\n" typo: specified. http://codereview.appspot.com/63060/diff/1/4#newcode296 Line 296: offset = length-zst.avail_in+offset; leave spaces around your - and + operators and this becomes easier to read. http://codereview.appspot.com/63060
msg87551 - (view)	Author: Robert Collins (rbcollins) * (Python committer)	Date: 2009年05月10日 21:14
Maybe I'm missing something, but isn't the offset parameter just another way of spelling buffer(input, offset)? I like the avoiding of copying, just wondering if having a magic parameter to get a tuple is really better than (say) result, used = zlib.decompress2(source) if used<len(source): result2, used = zlib.decompress2(buffer(source, used)) This changes two things. Rather than a magic parameter it adds a new function which always returns tuples (meaning that you don't need magic to make sure users don't accidentally pass offset=None and get a single result when they wanted a tuple), and uses the built in buffer facility to avoid copying rather than relying on delayed slicing.
msg87567 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2009年05月11日 08:56
Well, yes and no. I agree that it is annoying to have to add another parameter to get a returning tuple. But you have to weigh that against adding another conveinence API. On the other hand, I think the offset argument is quite convenient. If we were to add a "decompress2" function, I would like to keep the offset because it removes the extra overhead of having to create a temporary buffer object, at no extra cost. Also, note that your example code becomes more complex, for a loop. You would need to: results = [] used =0 while used<len(source): r, u = zlib.decompress2(buffer(source, used)) used += u results.append(r) as opposed to: r, used = zlib.decompress2(source, used) results.append(r) You have to make sure you add the buffer's hidden offset to the output offset. There is a third option. The "offset" could be a list containg the offset in its 0th position, a sort of "byref" passing, but that is not in the general Python spirit, is it? This would give us the loop: o = [0] while len(source)>o[0] results.append(zlib.decompress(source, offset=offsetlist))
msg87568 - (view)	Author: Robert Collins (rbcollins) * (Python committer)	Date: 2009年05月11日 09:02
Well, I think its relatively uncommon to be doing such a loop with a static buffer anyway - often you'll instead be reading from disk or a network stream; if we could make those cases simpler and avoid copying that would be great. Anyhow, no strong opinions here, just saw it going by before added my 2c. -Rob
msg110993 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010年07月21日 02:47
This has been reviewed see msg87550, can we get this into 3.2?
msg111017 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2010年07月21日 07:25
The function returns different kind of data depending on the value of the last parameter. I don't like it at all. Why is decompressobj not enough?
msg113415 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)	Date: 2010年08月09日 13:53
decompressobj is indeed "enough". But if you are doing a lot of this (decompressing chunks), then using the unused_data, as it is, involves a lot of copying. If there were a "unused_data_pos" or some equivalent, then it would be possible to continue decommpressing with buffer(olddata, unused_data_pos), without copying the source data. The point of the "offset" keyword argument was to have a way to get this end position also without having an explicit decompressobj. I agree that having the result type change to a tuple is not good. So, a suggestion: 1) Add the unused_data_pos to the decompressobj. 2) (opotional) Add the automatic retrieval of this with a decompress_ex method, for convenience, making it usable without creating a decompressobj 3) (optional) Add the "offset" kw argument to decompress and (decompress_ex) making the creation of a temporary buffer() object unnecessary.
msg229045 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2014年10月10日 23:24
Moving this from commit review back to no selection, since there doesn't yet seem to be an agreement on an API.
msg234040 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年01月14日 22:27
A different test case for "unused_data" attribute was added in 2012 for Issue 16350, so that part is no longer needed. If this feature goes ahead, it might be nice to also update the bzip and LZMA modules for consistency. In Python 3, the equivalent of the buffer() option would look like this, assuming the memory view is in byte format: zlib.decompress(memoryview(source)[unused_offset:]) Another option would be to copy some paint from the struct.unpack_from() bikeshed: [data, offset] = zlib.decompress_from(buffer, offset)

History
Date	User	Action	Args
2022年04月11日 14:56:48	admin	set	github: 50054
2015年01月14日 22:27:13	martin.panter	set	nosy: + martin.panter messages: + msg234040
2014年10月10日 23:24:51	r.david.murray	set	versions: + Python 3.5, - Python 3.2 nosy: + r.david.murray messages: + msg229045 stage: commit review ->
2014年02月03日 19:14:31	BreamoreBoy	set	nosy: - BreamoreBoy
2012年01月26日 13:07:45	nadeem.vawda	set	nosy: + nadeem.vawda
2010年08月09日 13:53:55	kristjan.jonsson	set	messages: + msg113415
2010年07月21日 07:25:33	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg111017
2010年07月21日 02:47:37	BreamoreBoy	set	versions: + Python 3.2, - Python 3.1, Python 2.7 nosy: + BreamoreBoy messages: + msg110993 stage: patch review -> commit review
2009年05月11日 09:02:04	rbcollins	set	messages: + msg87568
2009年05月11日 08:56:24	kristjan.jonsson	set	messages: + msg87567
2009年05月10日 21:14:40	rbcollins	set	nosy: + rbcollins messages: + msg87551
2009年05月10日 21:05:29	gregory.p.smith	set	messages: + msg87550
2009年05月10日 20:49:45	gregory.p.smith	set	nosy: + gregory.p.smith
2009年05月08日 09:45:40	kristjan.jonsson	set	title: Add a "tail" argument to zlib.decompress -> Add an 'offset' argument to zlib.decompress
2009年04月23日 19:18:40	kristjan.jonsson	set	files: + zlibpatch2.patch messages: + msg86373
2009年04月22日 10:26:02	kristjan.jonsson	set	messages: + msg86284
2009年04月21日 16:13:14	kristjan.jonsson	set	files: + zlibpatch.patch messages: + msg86247
2009年04月21日 15:57:20	pitrou	set	messages: + msg86244
2009年04月21日 15:46:42	kristjan.jonsson	set	messages: + msg86241
2009年04月21日 15:43:15	pitrou	set	priority: normal versions: + Python 3.1 nosy: + pitrou messages: + msg86240 stage: patch review
2009年04月21日 09:51:41	kristjan.jonsson	create

homepage