This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年05月24日 05:08 by Jim.Jewett, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch | h.venev, 2021年01月28日 18:54 | patch | ||
| bench.py | h.venev, 2021年01月28日 18:58 | |||
| 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch | h.venev, 2021年01月31日 20:34 | patch with test | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 25186 | merged | python-dev, 2021年04月04日 16:55 | |
| Messages (15) | |||
|---|---|---|---|
| msg243969 - (view) | Author: Jim Jewett (Jim.Jewett) * (Python triager) | Date: 2015年05月24日 05:08 | |
The specialized lookdict_* variants replace themselves with the generic lookdict as soon as a non-unicode key is looked up. They could reasonably leave the replacement to insertdict (line 819, currently assert rather than a replacement), when a non-unicode key is actually inserted into the dict. While inserts are less common than (all lookups including insert), I see the main advantage as reducing the number of duplications of the replacement logic. |
|||
| msg244014 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2015年05月25日 04:31 | |
You could have a non-unicode key that compares equal to a unicode string. |
|||
| msg244024 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2015年05月25日 09:50 | |
I don't understand why this has been closed. I agree with Jim's analysis. Lookups do not change the dict and the choice of lookdict_* variant depends solely on the set of keys. In fact, lookdict_split *doesn't* replace itself, it merely calls look_dict, leaving maintaining the invariant to insertdict. Benjamin, could you please reopen this and mark it as needing a patch. |
|||
| msg244026 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月25日 10:09 | |
It is not obvious that the patch is needed. If you have ready patch and good benchmark results, you could reopen the issue. Otherwise status quo wins. |
|||
| msg385876 - (view) | Author: Hristo Venev (h.venev) * | Date: 2021年01月28日 18:54 | |
I've attached a patch. I will soon provide benchmark results. |
|||
| msg385877 - (view) | Author: Hristo Venev (h.venev) * | Date: 2021年01月28日 18:58 | |
Benchmark program attached. 0. Creates a dict with str keys 1. str lookups 2. str subclass lookups on the same dict 3. str lookups on the same dict Results before patch: 0.9493069459404069 +- 0.004707371313935551 1.47313450980997 +- 0.01350596115630158 1.3181799192904144 +- 0.006550182814933545 Results after patch: 0.9498907704499289 +- 0.003721378313122522 1.4936510094799451 +- 0.057905386684185135 0.9494844124498195 +- 0.0029465760297623235 |
|||
| msg385994 - (view) | Author: Jim Jewett (Jim.Jewett) * (Python triager) | Date: 2021年01月30日 20:41 | |
This was originally "can be reopened if a patch is submitted" and Hristo Venev has now done so. Therefore, I am reopening. |
|||
| msg385996 - (view) | Author: Jim Jewett (Jim.Jewett) * (Python triager) | Date: 2021年01月30日 20:59 | |
Based on Hristo's timing, it appears to be a clear win. A near-wash for truly string-only dicts that shouldn't be effected; a near-wash for looking up non-(exact-)strings, and a nearly 40% speedup for the target case of looking up but not inserting a non-string or string subclass, then looking up strings thereafter. Additional comments: Barring objections, I will promote from patch review to commit review when I've had a chance to look more closely. I don't have commit privs, but I think some of the others following this issue do. The test looks pretty good enough -- good enough that I wonder if I'm missing something on the parts that seem odd. It would be great if you either cleaned them up or commented to explain why: Why is the first key vx1, which seems, if anything, like a variable? Why not k1 or string_key? Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"? Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression. This seems like a test that could probably be rolled into a bigger testfile for the actual commit. I don't have the name of such an appropriate file at hand right now, but will try to find it on a deeper review. |
|||
| msg385997 - (view) | Author: Hristo Venev (h.venev) * | Date: 2021年01月30日 21:28 | |
> Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"?
I wanted to construct a key that is equal to, but not the same object as, `'x1'`. Consider this example:
assert 'x1' is 'x1'
spam = 'x1'
assert spam is 'x1'
eggs = 'x'
eggs += '1'
assert eggs == 'x1'
assert eggs is not 'x1'
assert sys.intern(eggs) is 'x1'
When doing a dict lookup and the lookup key is the same object as a stored entry, `__eq__` is not called. Lookups are then significantly faster, maybe 20%.
Consider this example:
class EqTest:
def __eq__(self, other):
raise RuntimeError
def __hash__(self):
return id(self)
adict = {}
k1 = EqTest()
k2 = EqTest()
adict[k1] = 42
adict[k2] = 43
print(adict[k1], adict[k2])
Here `k1` is considered the same as `k1` and `k2` is considered the same as `k2`. However, `k1` and `k2` are considered distinct and never compared because they have different hashes.
However, if we were to set `EqTest.__hash__ = lambda self: 42`, we'd get a RuntimeError when we try to set `adict[k2]` because it would get compared for equality with `k1`.
Even if `__eq__` works, we can get some interesting behaviors. For example, when using multiple instances of `float('nan')` as keys.
> Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression.
I also tested a custom class that compares equal to strings. Other than being much slower, there weren't any significant differences. I also did some checks with int key lookups, which obviously fail with KeyError. They did not make the performance worse for the subsequent str lookups.
I will try to make a proper test tomorrow.
|
|||
| msg386000 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2021年01月30日 22:24 | |
Please test also creating a large dict with string-only or non-string keys. Since you add a code in insertdict() it can potentially affects performance. |
|||
| msg386040 - (view) | Author: Hristo Venev (h.venev) * | Date: 2021年01月31日 20:34 | |
I've attached a patch that should also contain a test. I also ran some benchmarks on dict creation/inserts. I couldn't notice any difference in performance. |
|||
| msg387505 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年02月22日 08:09 | |
I'm +1 for this. Would you create a pull request on GitHub? |
|||
| msg389933 - (view) | Author: Jim Jewett (Jim.Jewett) * (Python triager) | Date: 2021年03月31日 21:12 | |
What is the status on this? If you are losing interest, would you like someone else to turn your patch into a pull request? |
|||
| msg389934 - (view) | Author: Hristo Venev (h.venev) * | Date: 2021年03月31日 21:15 | |
Sorry, I must have missed Inada Naoki's reply. I will try to send a pull request this weekend. |
|||
| msg392273 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年04月29日 02:06 | |
New changeset 8557edbfa8f74514de82feea4c62f5963e4e0aa7 by Hristo Venev in branch 'master': bpo-24275: Don't downgrade unicode-only dicts to mixed on lookups (GH-25186) https://github.com/python/cpython/commit/8557edbfa8f74514de82feea4c62f5963e4e0aa7 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:17 | admin | set | github: 68463 |
| 2021年04月29日 02:06:31 | methane | set | status: open -> closed resolution: remind -> fixed stage: patch review -> resolved |
| 2021年04月29日 02:06:14 | methane | set | messages: + msg392273 |
| 2021年04月04日 16:55:33 | python-dev | set | nosy:
+ python-dev pull_requests: + pull_request23928 |
| 2021年03月31日 21:15:15 | h.venev | set | messages: + msg389934 |
| 2021年03月31日 21:12:54 | Jim.Jewett | set | messages: + msg389933 |
| 2021年02月22日 08:09:04 | methane | set | messages: + msg387505 |
| 2021年02月01日 01:46:54 | methane | set | nosy:
+ methane |
| 2021年01月31日 20:34:41 | h.venev | set | files:
+ 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch messages: + msg386040 |
| 2021年01月30日 22:24:18 | serhiy.storchaka | set | messages: + msg386000 |
| 2021年01月30日 21:28:47 | h.venev | set | messages: + msg385997 |
| 2021年01月30日 20:59:25 | Jim.Jewett | set | messages: + msg385996 |
| 2021年01月30日 20:41:43 | Jim.Jewett | set | status: closed -> open versions: + Python 3.10 messages: + msg385994 resolution: rejected -> remind stage: patch review |
| 2021年01月28日 18:58:11 | h.venev | set | files:
+ bench.py messages: + msg385877 |
| 2021年01月28日 18:54:14 | h.venev | set | files:
+ 0001-Don-t-downgrade-unicode-only-dicts-to-mixed-on-non-u.patch nosy: + h.venev messages: + msg385876 keywords: + patch |
| 2015年05月25日 10:09:46 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg244026 |
| 2015年05月25日 09:50:23 | Mark.Shannon | set | nosy:
+ Mark.Shannon messages: + msg244024 |
| 2015年05月25日 04:31:18 | benjamin.peterson | set | status: open -> closed nosy: + benjamin.peterson messages: + msg244014 resolution: rejected |
| 2015年05月24日 06:08:44 | vaultah | set | nosy:
+ vaultah |
| 2015年05月24日 05:08:29 | Jim.Jewett | create | |