homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: lookdict_* give up too soon
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, Mark.Shannon, benjamin.peterson, h.venev, methane, python-dev, serhiy.storchaka, vaultah
Priority: normal Keywords: patch

Created on 2015年05月24日 05:08 by Jim.Jewett, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch h.venev, 2021年01月28日 18:54 patch
bench.py h.venev, 2021年01月28日 18:58
0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch h.venev, 2021年01月31日 20:34 patch with test
Pull Requests
URL Status Linked Edit
PR 25186 merged python-dev, 2021年04月04日 16:55
Messages (15)
msg243969 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2015年05月24日 05:08
The specialized lookdict_* variants replace themselves with the generic lookdict as soon as a non-unicode key is looked up. 
They could reasonably leave the replacement to insertdict (line 819, currently assert rather than a replacement), when a non-unicode key is actually inserted into the dict. 
While inserts are less common than (all lookups including insert), I see the main advantage as reducing the number of duplications of the replacement logic.
msg244014 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015年05月25日 04:31
You could have a non-unicode key that compares equal to a unicode string.
msg244024 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2015年05月25日 09:50
I don't understand why this has been closed.
I agree with Jim's analysis.
Lookups do not change the dict and the choice of lookdict_* variant depends solely on the set of keys.
In fact, lookdict_split *doesn't* replace itself, it merely calls look_dict, leaving maintaining the invariant to insertdict.
Benjamin, could you please reopen this and mark it as needing a patch.
msg244026 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年05月25日 10:09
It is not obvious that the patch is needed. If you have ready patch and good benchmark results, you could reopen the issue. Otherwise status quo wins.
msg385876 - (view) Author: Hristo Venev (h.venev) * Date: 2021年01月28日 18:54
I've attached a patch. I will soon provide benchmark results.
msg385877 - (view) Author: Hristo Venev (h.venev) * Date: 2021年01月28日 18:58
Benchmark program attached.
0. Creates a dict with str keys
1. str lookups
2. str subclass lookups on the same dict
3. str lookups on the same dict
Results before patch:
0.9493069459404069 +- 0.004707371313935551
1.47313450980997 +- 0.01350596115630158
1.3181799192904144 +- 0.006550182814933545
Results after patch:
0.9498907704499289 +- 0.003721378313122522
1.4936510094799451 +- 0.057905386684185135
0.9494844124498195 +- 0.0029465760297623235
msg385994 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2021年01月30日 20:41
This was originally "can be reopened if a patch is submitted" and Hristo Venev has now done so. Therefore, I am reopening.
msg385996 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2021年01月30日 20:59
Based on Hristo's timing, it appears to be a clear win. 
A near-wash for truly string-only dicts that shouldn't be effected; a near-wash for looking up non-(exact-)strings, and a nearly 40% speedup for the target case of looking up but not inserting a non-string or string subclass, then looking up strings thereafter. 
Additional comments:
Barring objections, I will promote from patch review to commit review when I've had a chance to look more closely. I don't have commit privs, but I think some of the others following this issue do.
The test looks pretty good enough -- good enough that I wonder if I'm missing something on the parts that seem odd. It would be great if you either cleaned them up or commented to explain why:
Why is the first key vx1, which seems, if anything, like a variable? 
 Why not k1 or string_key?
Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"?
Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression.
This seems like a test that could probably be rolled into a bigger testfile for the actual commit. I don't have the name of such an appropriate file at hand right now, but will try to find it on a deeper review.
msg385997 - (view) Author: Hristo Venev (h.venev) * Date: 2021年01月30日 21:28
> Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"?
I wanted to construct a key that is equal to, but not the same object as, `'x1'`. Consider this example:
 assert 'x1' is 'x1'
 spam = 'x1'
 assert spam is 'x1'
 eggs = 'x'
 eggs += '1'
 assert eggs == 'x1'
 assert eggs is not 'x1'
 assert sys.intern(eggs) is 'x1'
When doing a dict lookup and the lookup key is the same object as a stored entry, `__eq__` is not called. Lookups are then significantly faster, maybe 20%.
Consider this example:
 class EqTest:
 def __eq__(self, other):
 raise RuntimeError
 def __hash__(self):
 return id(self)
 
 adict = {}
 k1 = EqTest()
 k2 = EqTest()
 
 adict[k1] = 42
 adict[k2] = 43
 print(adict[k1], adict[k2])
Here `k1` is considered the same as `k1` and `k2` is considered the same as `k2`. However, `k1` and `k2` are considered distinct and never compared because they have different hashes.
However, if we were to set `EqTest.__hash__ = lambda self: 42`, we'd get a RuntimeError when we try to set `adict[k2]` because it would get compared for equality with `k1`.
Even if `__eq__` works, we can get some interesting behaviors. For example, when using multiple instances of `float('nan')` as keys.
> Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression.
I also tested a custom class that compares equal to strings. Other than being much slower, there weren't any significant differences. I also did some checks with int key lookups, which obviously fail with KeyError. They did not make the performance worse for the subsequent str lookups.
I will try to make a proper test tomorrow.
msg386000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021年01月30日 22:24
Please test also creating a large dict with string-only or non-string keys. Since you add a code in insertdict() it can potentially affects performance.
msg386040 - (view) Author: Hristo Venev (h.venev) * Date: 2021年01月31日 20:34
I've attached a patch that should also contain a test.
I also ran some benchmarks on dict creation/inserts. I couldn't notice any difference in performance.
msg387505 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021年02月22日 08:09
I'm +1 for this. Would you create a pull request on GitHub?
msg389933 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2021年03月31日 21:12
What is the status on this? If you are losing interest, would you like someone else to turn your patch into a pull request?
msg389934 - (view) Author: Hristo Venev (h.venev) * Date: 2021年03月31日 21:15
Sorry, I must have missed Inada Naoki's reply. I will try to send a pull request this weekend.
msg392273 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021年04月29日 02:06
New changeset 8557edbfa8f74514de82feea4c62f5963e4e0aa7 by Hristo Venev in branch 'master':
bpo-24275: Don't downgrade unicode-only dicts to mixed on lookups (GH-25186)
https://github.com/python/cpython/commit/8557edbfa8f74514de82feea4c62f5963e4e0aa7
History
Date User Action Args
2022年04月11日 14:58:17adminsetgithub: 68463
2021年04月29日 02:06:31methanesetstatus: open -> closed
resolution: remind -> fixed
stage: patch review -> resolved
2021年04月29日 02:06:14methanesetmessages: + msg392273
2021年04月04日 16:55:33python-devsetnosy: + python-dev
pull_requests: + pull_request23928
2021年03月31日 21:15:15h.venevsetmessages: + msg389934
2021年03月31日 21:12:54Jim.Jewettsetmessages: + msg389933
2021年02月22日 08:09:04methanesetmessages: + msg387505
2021年02月01日 01:46:54methanesetnosy: + methane
2021年01月31日 20:34:41h.venevsetfiles: + 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch

messages: + msg386040
2021年01月30日 22:24:18serhiy.storchakasetmessages: + msg386000
2021年01月30日 21:28:47h.venevsetmessages: + msg385997
2021年01月30日 20:59:25Jim.Jewettsetmessages: + msg385996
2021年01月30日 20:41:43Jim.Jewettsetstatus: closed -> open
versions: + Python 3.10
messages: + msg385994

resolution: rejected -> remind
stage: patch review
2021年01月28日 18:58:11h.venevsetfiles: + bench.py

messages: + msg385877
2021年01月28日 18:54:14h.venevsetfiles: + 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch

nosy: + h.venev
messages: + msg385876

keywords: + patch
2015年05月25日 10:09:46serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg244026
2015年05月25日 09:50:23Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg244024
2015年05月25日 04:31:18benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg244014

resolution: rejected
2015年05月24日 06:08:44vaultahsetnosy: + vaultah
2015年05月24日 05:08:29Jim.Jewettcreate

AltStyle によって変換されたページ (->オリジナル) /