homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zlib set dictionary support inflateSetDictionary
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, Sam.Rushing, eric.araujo, jcea, nadeem.vawda, python-dev
Priority: normal Keywords: needs review, patch

Created on 2012年04月27日 20:49 by Sam.Rushing, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zlib_set_dictionary.patch Sam.Rushing, 2012年04月27日 20:49 adds inflateSetDictionary(), deflateSetDictionary() review
zlib_set_dictionary_2.patch Sam.Rushing, 2012年05月03日 23:48 set dictionary in constructors review
tz2.py Sam.Rushing, 2012年05月03日 23:50 verify correct stream behavior with dictionary
zlib_set_dictionary_3.patch Sam.Rushing, 2012年05月04日 00:51 review
zlib_set_dictionary_4.patch Sam.Rushing, 2012年05月05日 19:33 review
Messages (23)
msg159492 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年04月27日 20:49
Google's SPDY protocol requires the use of a pre-defined compression dictionary. The current zlib module doesn't expose the two functions for setting the dictionary.
This patch is minimal in the sense that it only exposes the two functions, but unfortunately the sequence of zlib calls required is clumsy: a call to inflate() must fail first (with an error of Z_NEED_DICT):
import zlib
zdict = b"thequickbrownfoxjumped\x00"
c = zlib.compressobj()
c.set_dictionary (zdict)
cd = c.compress (b"the quick brown fox jumped over the candlestick")
cd += c.flush()
d = zlib.decompressobj()
try:
 print (d.decompress (cd))
except zlib.error as what:
 if what.args[0].startswith ('Error 2 '):
 d.set_dictionary (zdict)
 print (d.flush())
Obviously a better way to catch/match Z_NEED_DICT would be nice.
A much nicer solution would allow you to associate the dictionary at creation time, rather than having to wait for an error condition. I can only guess that the zlib authors designed it that way for a reason?
msg159870 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年05月03日 19:31
A dictionary could be provided an init time. Then, the Z_NEED_DICT could be intercepted in the binding and automatically inject the dictionary provided in the init.
Anyway, for a patch to be approved, we need a test too.
PS: Why is this NOT targeted to 3.3?. We have time, yet.
msg159886 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月03日 23:04
I'm currently reworking this so that the dictionaries are provided in the constructor, and inflateSetDictionary() is called automatically. I've gone over the zlib RFC's and zlibmodule.c, and I'm fairly certain that whatever usage mode might involve multiple calls to SetDictionary() couldn't be supported by the zlib object anyway.
r.e. 3.3/3.4 - I merely chose the highest version number I saw, since this diff is against HEAD (as recommended by the docs). It's been quite a few years since I've submitted a patch to CPython!
msg159887 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年05月03日 23:38
Retargetting to python 3.3.
If you hurry a bit and I find your patch acceptable (remember the tests!), I will try to integrate it.
msg159888 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月03日 23:48
Ok, here's the patch. It has a single short test. For use with SPDY, it's necessary to test that the following stream data also correctly decompresses, I'll attach that to the next comment.
msg159889 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月03日 23:50
This test is rather large, since it includes the predefined SPDY draft 2 dictionary, and some real-world data. Not sure what the policy is on including so much data in a test. If there's enough time I could make a smaller test that also verifies the correct behavior on a stream...
msg159890 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月04日 00:00
Argh, probably need to add the 'dict' field to the copy() method.
msg159893 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月04日 00:50
Updated version of the patch: extends the test, including a test of the streaming behavior needed for SPDY (both compression and decompression).
Also wik: copy()/uncopy() are aware of the 'dict' attribute.
msg159897 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年05月04日 03:33
Added a few comments on Rietveld.
msg160004 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年05月05日 16:43
I've posted a review on Rietveld.
msg160028 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年05月05日 19:33
renames dict->zdict, splits the test, adds BEGIN/END around inflate call.
msg161082 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年05月18日 19:42
Status of this feature?. Ready to integrate?
msg161093 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年05月19日 06:08
The code should be changed to use the buffer API (instead of accepting
only bytes objects). Other than that, I think it's ready for integration.
msg163118 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年06月18日 20:21
Sam, the window for Python 3.3 integration is almost close. Could you possibly update your patch with Nadeem's feedback?.
msg163130 - (view) Author: Sam Rushing (Sam.Rushing) Date: 2012年06月19日 01:00
I think other than the disagreement about whether the dictionary constructor arg should be a buffer object, it's good to go.
To restate my position: the need is for an immutable string of bytes, and that's exactly what PyBytes_Type is for. I see no advantage to allowing a buffer object, which will require extra code to check that it is both a buffer object and set to be readonly. I believe the argument for aesthetics does not apply, as the constant dictionary constructor argument is a morally different kind of parameter, comparable to (say) the compression level.
You folks are of course welcome to change it, though. 8^)
msg163147 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年06月19日 08:57
> To restate my position: the need is for an immutable string of bytes, [...]
I disagree that we should require the dictionary to be immutable - if the
caller wishes to use a mutable buffer here, it is their responsibility to
ensure that it is not modified until the compressor is finished with it
("consenting adults" and all that). The documentation can inform users of
this requirement.
> I believe the argument for aesthetics does not apply, as the constant
> dictionary constructor argument is a morally different kind of
> parameter, comparable to (say) the compression level.
Even so, the surrounding code sets a precedent for how it accepts binary
data buffers, and deviating from this existing convention should not be
taken lightly.
Nitpicking about the API aside, thanks for the patch :-)
msg163199 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年06月19日 18:21
So my question is easy: could we apply this patch as is and defer any "improvement" to 3.4?. The risk of not doing so would be to miss 3.3 completely.
msg163228 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年06月19日 22:28
I plan to commit it (along with the buffer API changes) tomorrow.
msg163298 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年06月21日 00:20
New changeset dd4f7d5c51c7 by Nadeem Vawda in branch 'default':
Issue #14684: Add support for predefined compression dictionaries to the zlib module.
http://hg.python.org/cpython/rev/dd4f7d5c51c7 
msg163300 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年06月21日 00:27
Committed. Once again, thanks for the patch!
msg163339 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012年06月21日 15:24
Just saw this on the checkins list; where are the other options documented? 
"""
 PyDoc_STRVAR(compressobj__doc__,
-"compressobj([level]) -- Return a compressor object.\n"
+"compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
+" -- Return a compressor object.\n"
 "\n"
-"Optional arg level is the compression level, in 1-9.");
+"Optional arg level is the compression level, in 1-9.\n"
+"\n"
+"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
+"bytes containing subsequences that are likely to occur in the input data.");
"""
I'm honestly not certain what they should be, but the following is my best guess:
"""
 PyDoc_STRVAR(compressobj__doc__,
 "compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
 " -- Return a compressor object.\n"
 "\n"
-"Optional arg level is the compression level, in 1-9.\n"
+"Optional arg level (1-9) is the compression level.\n"
+"Larger numbers take longer, but produce smaller results.\n"
 "\n"
+"Optional arg method is the compression method.\n"
+"The only currently supported method is zlib.DEFLATED.\n"
+"\n"
+"Optional arg wbits determines the window buffer size.\n"
+"Normal values are 8 (least memory) to 15 (best compression).\n"
+"\n"
+"Optional arg memlevel (1-9) controls working memory size.\n"
+"Larger numbers use more memory, but produce smaller results more quickly.\n"
+"\n"
+"Optional arg strategy tunes the compression algorithm.\n"
+"Supported options include zlib.Z_DEFAULT_STRATEGY, zlib.Z_FILTERED, and zlib.Z_HUFFMAN_ONLY.\n"
+"\n"
+"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
+"bytes containing subsequences that are likely to occur in the input data.");
"""
msg163341 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年06月21日 15:36
> Just saw this on the checkins list; where are the other options documented? 
They aren't, AFAIK. I've been planning on adding them when I've got time
(based on the zlib manual at http://zlib.net/manual.html), but with the
upcoming feature freeze for 3.3, this issue was higher priority.
msg163380 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年06月21日 23:51
New changeset 1cfa44cb5af0 by Nadeem Vawda in branch 'default':
Document the rest of zlib.compressobj()'s arguments.
http://hg.python.org/cpython/rev/1cfa44cb5af0 
History
Date User Action Args
2022年04月11日 14:57:29adminsetgithub: 58889
2012年06月21日 23:51:41python-devsetmessages: + msg163380
2012年06月21日 15:36:20nadeem.vawdasetmessages: + msg163341
2012年06月21日 15:24:40Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg163339
2012年06月21日 00:27:01nadeem.vawdasetstatus: open -> closed
resolution: fixed
messages: + msg163300

stage: patch review -> resolved
2012年06月21日 00:20:56python-devsetnosy: + python-dev
messages: + msg163298
2012年06月19日 22:28:19nadeem.vawdasetmessages: + msg163228
2012年06月19日 18:21:05jceasetmessages: + msg163199
2012年06月19日 08:57:15nadeem.vawdasetmessages: + msg163147
2012年06月19日 01:01:00Sam.Rushingsetmessages: + msg163130
2012年06月18日 20:21:36jceasetmessages: + msg163118
2012年05月19日 06:08:09nadeem.vawdasetmessages: + msg161093
2012年05月18日 19:58:31eric.araujosetkeywords: + needs review
stage: patch review
2012年05月18日 19:42:30jceasetmessages: + msg161082
2012年05月05日 19:33:31Sam.Rushingsetfiles: + zlib_set_dictionary_4.patch

messages: + msg160028
2012年05月05日 16:43:06nadeem.vawdasetmessages: + msg160004
2012年05月05日 10:01:11nadeem.vawdasetnosy: + nadeem.vawda
2012年05月04日 03:33:02eric.araujosetnosy: + eric.araujo
messages: + msg159897
2012年05月04日 00:51:05Sam.Rushingsetfiles: + zlib_set_dictionary_3.patch
2012年05月04日 00:50:27Sam.Rushingsetmessages: + msg159893
2012年05月04日 00:00:22Sam.Rushingsetmessages: + msg159890
2012年05月03日 23:50:04Sam.Rushingsetfiles: + tz2.py

messages: + msg159889
2012年05月03日 23:48:08Sam.Rushingsetfiles: + zlib_set_dictionary_2.patch

messages: + msg159888
2012年05月03日 23:39:10jceasetassignee: roger.serwy ->

nosy: - roger.serwy
2012年05月03日 23:38:24jceasetversions: + Python 3.3, - Python 3.4
nosy: + roger.serwy

messages: + msg159887

assignee: roger.serwy
2012年05月03日 23:04:16Sam.Rushingsetmessages: + msg159886
2012年05月03日 19:31:39jceasetnosy: + jcea
messages: + msg159870
2012年04月27日 20:49:09Sam.Rushingcreate

AltStyle によって変換されたページ (->オリジナル) /