Message 385997 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	h.venev
Recipients	Jim.Jewett, Mark.Shannon, benjamin.peterson, h.venev, serhiy.storchaka, vaultah
Date	2021年01月30日.21:28:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1612042127.53.0.603229330005.issue24275@roundup.psfhosted.org>

Content
> Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"? I wanted to construct a key that is equal to, but not the same object as, `'x1'`. Consider this example: assert 'x1' is 'x1' spam = 'x1' assert spam is 'x1' eggs = 'x' eggs += '1' assert eggs == 'x1' assert eggs is not 'x1' assert sys.intern(eggs) is 'x1' When doing a dict lookup and the lookup key is the same object as a stored entry, `__eq__` is not called. Lookups are then significantly faster, maybe 20%. Consider this example: class EqTest: def __eq__(self, other): raise RuntimeError def __hash__(self): return id(self) adict = {} k1 = EqTest() k2 = EqTest() adict[k1] = 42 adict[k2] = 43 print(adict[k1], adict[k2]) Here `k1` is considered the same as `k1` and `k2` is considered the same as `k2`. However, `k1` and `k2` are considered distinct and never compared because they have different hashes. However, if we were to set `EqTest.__hash__ = lambda self: 42`, we'd get a RuntimeError when we try to set `adict[k2]` because it would get compared for equality with `k1`. Even if `__eq__` works, we can get some interesting behaviors. For example, when using multiple instances of `float('nan')` as keys. > Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to also test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression. I also tested a custom class that compares equal to strings. Other than being much slower, there weren't any significant differences. I also did some checks with int key lookups, which obviously fail with KeyError. They did not make the performance worse for the subsequent str lookups. I will try to make a proper test tomorrow.

Content

> Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"?
I wanted to construct a key that is equal to, but not the same object as, `'x1'`. Consider this example:
 assert 'x1' is 'x1'
 spam = 'x1'
 assert spam is 'x1'
 eggs = 'x'
 eggs += '1'
 assert eggs == 'x1'
 assert eggs is not 'x1'
 assert sys.intern(eggs) is 'x1'
When doing a dict lookup and the lookup key is the same object as a stored entry, `__eq__` is not called. Lookups are then significantly faster, maybe 20%.
Consider this example:
 class EqTest:
 def __eq__(self, other):
 raise RuntimeError
 def __hash__(self):
 return id(self)
 
 adict = {}
 k1 = EqTest()
 k2 = EqTest()
 
 adict[k1] = 42
 adict[k2] = 43
 print(adict[k1], adict[k2])
Here `k1` is considered the same as `k1` and `k2` is considered the same as `k2`. However, `k1` and `k2` are considered distinct and never compared because they have different hashes.
However, if we were to set `EqTest.__hash__ = lambda self: 42`, we'd get a RuntimeError when we try to set `adict[k2]` because it would get compared for equality with `k1`.
Even if `__eq__` works, we can get some interesting behaviors. For example, when using multiple instances of `float('nan')` as keys.
> Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression.
I also tested a custom class that compares equal to strings. Other than being much slower, there weren't any significant differences. I also did some checks with int key lookups, which obviously fail with KeyError. They did not make the performance worse for the subsequent str lookups.
I will try to make a proper test tomorrow.

History
Date	User	Action	Args
2021年01月30日 21:28:47	h.venev	set	recipients: + h.venev, benjamin.peterson, Mark.Shannon, Jim.Jewett, serhiy.storchaka, vaultah
2021年01月30日 21:28:47	h.venev	set	messageid: <1612042127.53.0.603229330005.issue24275@roundup.psfhosted.org>
2021年01月30日 21:28:47	h.venev	link	issue24275 messages
2021年01月30日 21:28:47	h.venev	create

homepage