Issue 4296: Python assumes identity implies equivalence; contradicts NaN

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48546

classification

Title:	Python assumes identity implies equivalence; contradicts NaN
Type:	behavior	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.0

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	barry	Nosy List:	barry, christian.heimes, gvanrossum, mark.dickinson, mikecurtis, rhettinger, smarnach
Priority:	release blocker	Keywords:	patch

Created on 2008年11月11日 01:44 by mikecurtis, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue4296_test.patch	christian.heimes, 2008年11月11日 12:49
issue4296.patch	mark.dickinson, 2008年11月12日 17:58

Messages (24)
msg75716 - (view)	Author: Michael B Curtis (mikecurtis)	Date: 2008年11月11日 01:44
Found in Python 2.4; not sure what other versions may be affected. I noticed a contradiction with regards to equivalence when experimenting with NaN, which has the special property that it "is" itself, but it doesn't "==" itself: >>> a = float('nan') >>> a is a True >>> a == a False >>> b = [float('nan')] >>> b is b True >>> b == b True I am not at all familiar with Python internals, but the issue appears to be in PyObject_RichCompareBool of python/trunk/Objects/object.c This method "Guarantees that identity implies equality". However, this doesn't "Gaurantee" this fact, but instead "Assumes" it, because it is not something that is always True. NaN is identical to itself, but not equivalent to itself. At a minimum, the contradiction introduced by this assumption should be documented. However, it may be possible to do better, by fixing it. The assumption appears to be made that identity should imply equivalence, for the common case. Would it therefore be possible to, instead of having objects such as lists perform this optimization and make this assumption, instead have the base object types implement this assumption. That is, for regular objects, when we evaluate equivalence, we return True if the objects are identical. Then, the optimization can be removed from objects such as list, so that when they check the equivalence of each object, the optimization is performed there. NaN can then override the default behavior, so that it always returns False in equivalence comparisons.
msg75717 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 01:49
Interesting, Python 3.0 behaves differently than Python 2.x. Nice catch! :) Python 3.0rc2 (r30rc2:67177, Nov 10 2008, 12:12:09) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> nan = float("nan") >>> nan is nan True >>> nan == nan False >>> lst = [nan] >>> lst is lst True >>> lst == lst False Python 2.6 (r26:66714, Oct 2 2008, 16:17:49) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> nan = float("nan") >>> lst = [nan] >>> lst == lst True >>> lst is lst True
msg75733 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 10:52
This is indeed interesting. For what it's worth, I think the Python 3.0 behaviour is the right one, here. Perhaps it's worth adding a test to Python 3.0 to make sure that this behaviour doesn't accidentally disappear, or at least to make sure that someone thinks about it before making the behaviour disappear. But in general I don't think the fact that NaNs are weird should get in the way of optimizations. In other words, I'm not sure that Python should be asked to guarantee anything more than "b == b" returning False when b is a NaN. It wouldn't seem unreasonable to consider behaviour of nans in containers (sets, lists, dicts) as undefined when it comes to equality and identity checks. There are other questionable things going on when NaNs are put into containers. For example: >>> a = float('nan') >>> b = [a] >>> a in b True A strict reading of the definition of 'in' would say that "a in b" should return False here, since a is not equal to any element of b. But I presume that the 'in' operator does identity checks under the hood before testing for equality. 'Fixing' this so that nans did the right thing might slow down a lot of other code just to handle one peculiar special case correctly. Michael, is this adversely affecting real-world code? If not, I'd be inclined to say that it's not worth changing Python's behaviour here.
msg75734 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 10:57
One more data point: in both 2.x and 3.x, sets behave like the 2.x lists: Python 3.0rc2+ (py3k:67177M, Nov 10 2008, 16:06:34) [GCC 4.0.1 (Apple Inc. build 5488)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> s = {float('nan')} >>> s == s True >>> s is s True
msg75735 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年11月11日 12:03
I'm not happy with the 3.0 change. IMO, the best behavior is the one that allows code reviewers to correctly reason about the code by assuming basic invariants. In 2.6, all of the following are always true: assert a in [a] assert a in (a,) assert a in set([a]) assert [a].count(a) == 1 This is especially important because it lets us maintain a consistent meaning for "in" its two contexts: for a in container: assert a in container # this should ALWAYS be true This shows that "is" is essential in the interpretation of "in". If you loop over elements in a container, then by definition those exact elements are IN the container. If we break this, it will lead to hard to find errors and language inconsistencies. The "identity implies equality" rule isn't just an optimization, it is a deep assumption that pervades the language implementation. Lots of logic relies on it to maintain invariants. It looks like the 3.0 changes are fighting this state of affairs. IMO, those changes are fighting an uphill battle and will introduce more oddities than they eliminate.
msg75736 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 12:12
Oh h... Raymond, you are so right! I was too tired to think of all implications related to the different semantic in 3.0, especially the for in / is in invariant. I'm including Guido and Barry into the discussion. This topic needs some attention from the gods of old. :)
msg75737 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 12:21
[Raymond] > assuming basic invariants. In 2.6, all of the following are always > true: > > assert a in [a] > assert a in (a,) > assert a in set([a]) > assert [a].count(a) == 1 And these are all still true in 3.0 as well, aren't they? In any case, you've convinced me. I withdraw my comment about the Python 3.0 behaviour being the right one.
msg75738 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 12:49
Here is a tes case for 3.0
msg75739 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年11月11日 13:10
To be clear, I'm saying that PyObject_RichCompareBool() needs to add-back the code: /* Quick result when objects are the same. Guarantees that identity implies equality. */ if (v == w) { if (op == Py_EQ) return 1; else if (op == Py_NE) return 0; } When the above code was originally put in, there was discussion about it on python-dev and time has proven it to be a successful choice. I don't know who took this out, but taking it out was a mistake in a world that allows rich comparison functions to return anything they want, possibly screwing-up the basic invariants everywhere we do membership testing. Taking it out probably had something to do with NaNs, but this discussion needs to avoid getting lost in NaN paradoxes and instead focus on a notion of membership that is ALWAYS true given object identity. This is an essential pragmatism necessary for reasoning about programs. Consider that while PyObject_RichCompare() can return any object and can be used in varied and sundry ways, the opposite is true for PyObject_RichCompareBool(). The latter always returns a boolean and its internal use cases are almost always ones that assume the traditional properties of equality -- a binary relation or predicate that is reflexive, symmetric, and transitive. We let the == operator violate those properties, but the calls to PyObject_RichCompareBool() tend to DEPEND on them. --------------------------------------------------------------- P.S. Mark, those Py2.6 invariants are not still true in Py3.0: IDLE 3.0rc2 >>> a = float('nan') >>> a in [a] False >>> a in (a,) False >>> a in set([a]) True >>> [a].count(a) 0 >>> for x in container: assert x in container AssertionError
msg75740 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 13:58
> Taking it out probably had something to do with NaNs, but this > discussion needs to avoid getting lost in NaN paradoxes and instead > focus on a notion of membership that is ALWAYS true given object > identity. This is an essential pragmatism necessary for reasoning about > programs. I agree wholeheartedly. NaN comparison weirdness isn't anywhere near important enough to justify breaking these invariants. Though I imagine that if 'x == x' started returning True for NaNs there might be some complaints. > P.S. Mark, those Py2.6 invariants are not still true in Py3.0: You're right, of course.
msg75742 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 14:46
With the lines Raymond mentions restored, all tests (including the extra tests Christian's posted here) pass for me, and I also get: >>> a = [float('nan')] >>> a == a True Incidentally, it looks as though the PyObject_RichCompareBool lines were removed in r51533.
msg75745 - (view)	Author: Michael B Curtis (mikecurtis)	Date: 2008年11月11日 15:59
All, Thank you for your rigorous analysis of this bug. To answer the question of the impact of this bug: the real issue that caused problems for our application was Python deciding to silently cast NaN falues to 0L, as discussed here: http://mail.python.org/pipermail/python-dev/2008-January/075865.html This would cause us to erroneously recognize 0s in our dataset when our input was invalid, which caused various issues. Per that thread, it sounds like there is no intention to fix this for versions prior to 3.0, so I decided to detect NaN values early on with the following: def IsNan(x): return (x is x) and (x != x) This is not the most rigorous check, but since our inputs are expected to be restricted to N-dimensional lists of numeric and/or string values, this was sufficient for our purposes. However, I wanted to be clear as to what would happen if this were handed a vector or matrix containing a NaN, so I did a quick check, which led me to this bug. My workaround is to manually avoid the optimization, with the following code: def IsNan(x): if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set): for i in x: if IsNan(i): return True return False else: return (x is x) and (x != x) This isn't particularly pretty, but since our inputs are relatively constrained, and since this isn't performance-critical code, it suffices for our purposes. For anyone working with large datasets, this would be suboptimal. (As an aside, if someone has a better solution for a general-case NaN-checker, which I'm sure someone does, feel free to let me know what it is). Additionally, while I believe that it is most correct to say that a list containing NaN is not equal to itself, I would hesitate to claim that it is even what most applications would desire. I could easily imagine individuals who would only wish for the list to be considered NaN-like if all of its values are NaN. Of course, that wouldn't be solved by any changes that might be made here. Once one gets into that level of detail, I think the programmer needs to implement the check manually to guarantee any particular expected outcome. Returning to the matter at hand: while I cringe to know that there is this inconsistency in the language, as a realist I completely agree that it would be unreasonable to remove the optimization to preserve this very odd corner case. For this reason, I proposed a minimal solution here to be that this oddity merely be documented better. Thanks again for your thoughts.
msg75747 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 16:44
[Michael] > the real issue that caused problems [...] was Python deciding to > silently cast NaN falues to 0L > [...] > it sounds like there is no intention to fix this for versions prior > to 3.0, Oh, <rude words> <rude words> <more rude words>! Guido's message does indeed say that that behaviour shouldn't be changed before 3.0. And if I'd managed to notice his message, I wouldn't have 'fixed' it for 2.6. Python 2.7a0 (trunk:67115, Nov 6 2008, 08:37:21) [GCC 4.0.1 (Apple Inc. build 5488)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> int(float('nan')) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot convert float NaN to integer :-(. [Imagine me looking extreeemely sheepish.] I guess I owe apologies to Christian and Guido here. Sorry, folks. Is there any way I can make amends?
msg75749 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 16:54
The issue with nan -> 0L was fixed in Python 2.6. I can't recall if I fixed it or somebody else but I know it was discussed. $ python2.6 >>> long(float("nan")) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot convert float NaN to integer >>> long(float("inf")) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: cannot convert float infinity to integer
msg75756 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年11月11日 19:33
Mark, thanks for checking that all the tests still pass. Please do add back the missing lines in PyObject_RichCompareBool(). It seems that their removal was not a necessary part of eliminating default sort-ordering.
msg75759 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 20:45
(Comment for Michael, everybody else can safely ignore it. Instead of writing: if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set): you can write: if isinstance(x, (list, tuple, set)): or even better: if hasattr(x, "__iter__"): Starting with Python 2.6 you can use "from math import isnan", too. )
msg75761 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月11日 21:34
Here's a patch incorporating Christian's tests, the missing lines from PyObject_RichCompareBool, and some additional tests to check that [x] == [x] and the like remain True even when x is a NaN. Oh, and a Misc/NEWS entry. With this patch, all tests pass (debug and non-debug 32-bit builds) on OS X 10.5/Intel. I think we should get the okay from Guido before committing this. Maybe he remembers why he deleted these lines in the first place. :-)
msg75763 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月11日 22:01
The tests are passing on Ubuntu Linux 64bit, too. Good work everybody!
msg75773 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年11月12日 01:22
Mark, the patch looks good. Thanks. Before committing, please add one other test that verifies the relationship between "in" in membership tests and "in" in a for-loop: for constructor in list, tuple, dict.fromkeys, set, collections.deque: container = constructor([float('nan'), 1, None, 'abc']) for elem in container : self.assert_(elem in container)
msg75790 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月12日 17:58
> Before committing, please add one > other test that verifies the relationship between "in" in membership > tests and "in" in a for-loop: Added. (To test_contains, since this seems like a more appropriate place than test_float.)
msg75801 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年11月12日 21:46
Reviewed the updated patch and all is well. Thanks.
msg75802 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年11月12日 21:53
Mark, would you do the honor, please?
msg75808 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2008年11月12日 23:26
Committed, r67204. Thanks guys, and thanks especially to Michael for spotting this before 3.0 final.
msg139291 - (view)	Author: Sven Marnach (smarnach)	Date: 2011年06月27日 15:00
The behaviour discussed in this thread does not seem to be reflected in Python's documentation. The documentation of __eq__() [1] doesn't mention that objects should compare equal to themselves. [1]: http://docs.python.org/dev/reference/datamodel.html#object.__eq__ There are several places in the documentation that are wrong for NaNs; just one example is the documentation of sequence types [2], which states: > This means that to compare equal, every element must compare equal > and the two sequences must be of the same type and have the same > length. [2]: http://docs.python.org/dev/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range It's probably not worthwhile to "fix" all the places in the documentation that implicitly assume that objects compare equal to themselves, but it probably is a good idea to mention that __eq__() implementations should fulfil this assumption to avoid strange behaviour when used in combination with standard containers. Any thoughts?

History
Date	User	Action	Args
2022年04月11日 14:56:41	admin	set	github: 48546
2011年06月27日 15:00:51	smarnach	set	nosy: + smarnach messages: + msg139291
2011年04月28日 19:33:26	terry.reedy	set	resolution: fixed
2008年11月12日 23:26:19	mark.dickinson	set	status: open -> closed messages: + msg75808
2008年11月12日 21:53:16	christian.heimes	set	messages: + msg75802
2008年11月12日 21:46:57	rhettinger	set	messages: + msg75801
2008年11月12日 17:58:46	mark.dickinson	set	files: + issue4296.patch messages: + msg75790
2008年11月12日 17:57:36	mark.dickinson	set	files: - issue4296.patch
2008年11月12日 01:22:06	rhettinger	set	messages: + msg75773
2008年11月11日 22:01:07	christian.heimes	set	assignee: gvanrossum -> barry messages: + msg75763 stage: test needed -> patch review
2008年11月11日 21:34:26	mark.dickinson	set	files: + issue4296.patch assignee: gvanrossum messages: + msg75761
2008年11月11日 20:45:41	christian.heimes	set	messages: + msg75759
2008年11月11日 19:33:49	rhettinger	set	messages: + msg75756
2008年11月11日 16:54:00	christian.heimes	set	messages: + msg75749
2008年11月11日 16:44:53	mark.dickinson	set	messages: + msg75747
2008年11月11日 15:59:28	mikecurtis	set	messages: + msg75745
2008年11月11日 14:46:21	mark.dickinson	set	messages: + msg75742
2008年11月11日 13:58:02	mark.dickinson	set	messages: + msg75740
2008年11月11日 13:10:49	rhettinger	set	messages: + msg75739
2008年11月11日 12:49:33	christian.heimes	set	files: + issue4296_test.patch keywords: + patch messages: + msg75738
2008年11月11日 12:21:42	mark.dickinson	set	messages: + msg75737
2008年11月11日 12:12:21	christian.heimes	set	priority: normal -> release blocker nosy: + gvanrossum, barry messages: + msg75736
2008年11月11日 12:03:39	rhettinger	set	nosy: + rhettinger messages: + msg75735 versions: + Python 3.0, - Python 2.6, Python 2.5, Python 2.4, Python 2.7
2008年11月11日 10:57:41	mark.dickinson	set	messages: + msg75734
2008年11月11日 10:52:50	mark.dickinson	set	messages: + msg75733
2008年11月11日 01:49:50	christian.heimes	set	priority: normal nosy: + mark.dickinson, christian.heimes stage: test needed messages: + msg75717 versions: + Python 2.6, Python 2.5, Python 2.7
2008年11月11日 01:44:41	mikecurtis	create

homepage