homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Sort dict items in urlencode()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: chris.jerdonek, eric.araujo, gregory.p.smith, gvanrossum, jcea, pitrou
Priority: normal Keywords: easy

Created on 2012年08月17日 20:44 by gvanrossum, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (6)
msg168477 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年08月17日 20:44
From http://mail.python.org/pipermail/python-dev/2012-August/121364.html :
"""
I just fixed a unittest for some code used at Google that was
comparing a url generated by urllib.encode() to a fixed string. The
problem was caused by turning on PYTHONHASHSEED=1. Because of this,
the code under test would generate a textually different URL each time
the test was run, but the intention of the test was just to check that
all the query parameters were present and equal to the expected
values.
The solution was somewhat painful, I had to parse the url, split the
query parameters, and compare them to a known dict.
I wonder if it wouldn't make sense to change urlencode() to generate
URLs that don't depend on the hash order, for all versions of Python
that support PYTHONHASHSEED? It seems a one-line fix:
 query = query.items()
with this:
 query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a
much simpler fix possible: simply sort the parameters in the URL.
"""
MvL's response (http://mail.python.org/pipermail/python-dev/2012-August/121366.html) seems to suggest that this can go in as a bugfix in 3.2 and later if augmented with a check for type(query) is dict and a check for whether dict hashing is enabled.
Does anyone want to turn this idea into a patch?
msg168521 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年08月18日 17:35
Note, the sort may fail if e.g. bytes and str are mixed in the keys. So this should be caught and ignored.
An alternative would just be to fix the call site to pass in sorted(query.items()) instead of sorted(query).
Still, I think that more predictable output would be better.
msg168525 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年08月18日 19:57
Honestly, this looks much more like an enhancement than a bugfix to me, so I think it should only target 3.4.
As for the idea proper, I sounds ok to me, as long as only exact dict instances are affected (i.e. OrderedDict and friends keep using their native order).
msg168544 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012年08月19日 03:34
-1 on doing this from me.
While I don't see it hurting anything when "type(query) is dict" I'd much rather encourage people to write better tests that do not take the lazy way out. Tests that get by comparing a generated string to a "golden" string are soo often wrong because the golden string is not what they actually wanted to test. Instead the author meant to test the meaning of the result rather than the specific incantation of it that they pasted in their code for an assertEqual check.
(This does suggest that some additional unittest methods to make comparing common things like this in a safe manner would be useful to people. urls, html and xml with attributes, json, etc)
Ex: We've run into many tests that were failing due to hash randomization because they compared json strings... rather than suggest that json.dumps sort dictionaries before generating output, having the tests use json.loads and compare the actual data structure rather than a golden value string is much better.
If a change like this is made, it needs to be conditional in 2.7 and 3.2 to only sort when hash randomization is enabled to avoid altering the existing behavior of urlencode for the default PYTHONHASHSEED=0 case on 2.7 and 3.2.
As far as this being an enhancement rather than a bugfix... I sort of agree that this is an enhancement. And if this is an enhancement, we don't need to do it at all: People will already have done the right thing making their code work with 3.3's hash randomization by default before long before 3.4 is out.
msg168545 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年08月19日 04:14
I've come to see more downsides than upsides to this idea. I'm withdrawing it. Good points, Greg! (And others.)
msg170302 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012年09月11日 13:14
As an aside, I noticed a doctest affected by this in the urllib HOWTO:
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)
name=Somebody+Here&language=Python&location=Northampton
http://docs.python.org/dev/howto/urllib2.html#data
(search for the string "Somebody"). This is merely a curiosity though (i.e. don't construe this as an argument on the issue :)).
History
Date User Action Args
2022年04月11日 14:57:34adminsetgithub: 59924
2012年09月11日 13:14:20chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg170302
2012年09月10日 01:06:31jceasetnosy: + jcea
2012年08月19日 04:14:50gvanrossumsetstatus: open -> closed
resolution: rejected
messages: + msg168545
2012年08月19日 03:34:42gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg168544
2012年08月18日 19:57:27pitrousetversions: - Python 3.2, Python 3.3
nosy: + pitrou

messages: + msg168525

components: + Library (Lib)
type: enhancement
2012年08月18日 17:35:19gvanrossumsetmessages: + msg168521
2012年08月17日 21:07:25eric.araujosetnosy: + eric.araujo
2012年08月17日 20:44:52gvanrossumcreate

AltStyle によって変換されたページ (->オリジナル) /