This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年11月11日 01:44 by mikecurtis, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue4296_test.patch | christian.heimes, 2008年11月11日 12:49 | |||
| issue4296.patch | mark.dickinson, 2008年11月12日 17:58 | |||
| Messages (24) | |||
|---|---|---|---|
| msg75716 - (view) | Author: Michael B Curtis (mikecurtis) | Date: 2008年11月11日 01:44 | |
Found in Python 2.4; not sure what other versions may be affected.
I noticed a contradiction with regards to equivalence when experimenting
with NaN, which has the special property that it "is" itself, but it
doesn't "==" itself:
>>> a = float('nan')
>>> a is a
True
>>> a == a
False
>>> b = [float('nan')]
>>> b is b
True
>>> b == b
True
I am not at all familiar with Python internals, but the issue appears to
be in PyObject_RichCompareBool of python/trunk/Objects/object.c
This method "Guarantees that identity implies equality". However, this
doesn't "Gaurantee" this fact, but instead "Assumes" it, because it is
not something that is always True. NaN is identical to itself, but not
equivalent to itself.
At a minimum, the contradiction introduced by this assumption should be
documented. However, it may be possible to do better, by fixing it.
The assumption appears to be made that identity should imply
equivalence, for the common case. Would it therefore be possible to,
instead of having objects such as lists perform this optimization and
make this assumption, instead have the base object types implement this
assumption. That is, for regular objects, when we evaluate equivalence,
we return True if the objects are identical. Then, the optimization can
be removed from objects such as list, so that when they check the
equivalence of each object, the optimization is performed there. NaN
can then override the default behavior, so that it always returns False
in equivalence comparisons.
|
|||
| msg75717 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 01:49 | |
Interesting, Python 3.0 behaves differently than Python 2.x. Nice catch! :)
Python 3.0rc2 (r30rc2:67177, Nov 10 2008, 12:12:09)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nan = float("nan")
>>> nan is nan
True
>>> nan == nan
False
>>> lst = [nan]
>>> lst is lst
True
>>> lst == lst
False
Python 2.6 (r26:66714, Oct 2 2008, 16:17:49)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nan = float("nan")
>>> lst = [nan]
>>> lst == lst
True
>>> lst is lst
True
|
|||
| msg75733 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 10:52 | |
This is indeed interesting. For what it's worth, I think
the Python 3.0 behaviour is the right one, here. Perhaps
it's worth adding a test to Python 3.0 to make sure that
this behaviour doesn't accidentally disappear, or at least
to make sure that someone thinks about it before making
the behaviour disappear.
But in general I don't think the fact that NaNs are weird
should get in the way of optimizations. In other words,
I'm not sure that Python should be asked to guarantee
anything more than "b == b" returning False when b is
a NaN. It wouldn't seem unreasonable to consider
behaviour of nans in containers (sets, lists, dicts)
as undefined when it comes to equality and identity
checks.
There are other questionable things going on when NaNs
are put into containers. For example:
>>> a = float('nan')
>>> b = [a]
>>> a in b
True
A strict reading of the definition of 'in' would say
that "a in b" should return False here, since a is not
equal to any element of b. But I presume that the 'in'
operator does identity checks under the hood before
testing for equality. 'Fixing' this so that nans did
the right thing might slow down a lot of other code
just to handle one peculiar special case correctly.
Michael, is this adversely affecting real-world code?
If not, I'd be inclined to say that it's not worth
changing Python's behaviour here.
|
|||
| msg75734 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 10:57 | |
One more data point: in both 2.x and 3.x, sets behave like the 2.x
lists:
Python 3.0rc2+ (py3k:67177M, Nov 10 2008, 16:06:34)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = {float('nan')}
>>> s == s
True
>>> s is s
True
|
|||
| msg75735 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年11月11日 12:03 | |
I'm not happy with the 3.0 change. IMO, the best behavior is the one that allows code reviewers to correctly reason about the code by assuming basic invariants. In 2.6, all of the following are always true: assert a in [a] assert a in (a,) assert a in set([a]) assert [a].count(a) == 1 This is especially important because it lets us maintain a consistent meaning for "in" its two contexts: for a in container: assert a in container # this should ALWAYS be true This shows that "is" is essential in the interpretation of "in". If you loop over elements in a container, then by definition those exact elements are IN the container. If we break this, it will lead to hard to find errors and language inconsistencies. The "identity implies equality" rule isn't just an optimization, it is a deep assumption that pervades the language implementation. Lots of logic relies on it to maintain invariants. It looks like the 3.0 changes are fighting this state of affairs. IMO, those changes are fighting an uphill battle and will introduce more oddities than they eliminate. |
|||
| msg75736 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 12:12 | |
Oh h... Raymond, you are so right! I was too tired to think of all implications related to the different semantic in 3.0, especially the for in / is in invariant. I'm including Guido and Barry into the discussion. This topic needs some attention from the gods of old. :) |
|||
| msg75737 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 12:21 | |
[Raymond] > assuming basic invariants. In 2.6, all of the following are always > true: > > assert a in [a] > assert a in (a,) > assert a in set([a]) > assert [a].count(a) == 1 And these are all still true in 3.0 as well, aren't they? In any case, you've convinced me. I withdraw my comment about the Python 3.0 behaviour being the right one. |
|||
| msg75738 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 12:49 | |
Here is a tes case for 3.0 |
|||
| msg75739 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年11月11日 13:10 | |
To be clear, I'm saying that PyObject_RichCompareBool() needs to
add-back the code:
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
When the above code was originally put in, there was discussion about it
on python-dev and time has proven it to be a successful choice.
I don't know who took this out, but taking it out was a mistake in a
world that allows rich comparison functions to return anything they
want, possibly screwing-up the basic invariants everywhere we do
membership testing.
Taking it out probably had something to do with NaNs, but this
discussion needs to avoid getting lost in NaN paradoxes and instead
focus on a notion of membership that is ALWAYS true given object
identity. This is an essential pragmatism necessary for reasoning about
programs.
Consider that while PyObject_RichCompare() can return any object and can
be used in varied and sundry ways, the opposite is true for
PyObject_RichCompareBool(). The latter always returns a boolean and its
internal use cases are almost always ones that assume the traditional
properties of equality -- a binary relation or predicate that is
reflexive, symmetric, and transitive. We let the == operator violate
those properties, but the calls to PyObject_RichCompareBool() tend to
DEPEND on them.
---------------------------------------------------------------
P.S. Mark, those Py2.6 invariants are not still true in Py3.0:
IDLE 3.0rc2
>>> a = float('nan')
>>> a in [a]
False
>>> a in (a,)
False
>>> a in set([a])
True
>>> [a].count(a)
0
>>> for x in container:
assert x in container
AssertionError
|
|||
| msg75740 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 13:58 | |
> Taking it out probably had something to do with NaNs, but this > discussion needs to avoid getting lost in NaN paradoxes and instead > focus on a notion of membership that is ALWAYS true given object > identity. This is an essential pragmatism necessary for reasoning about > programs. I agree wholeheartedly. NaN comparison weirdness isn't anywhere near important enough to justify breaking these invariants. Though I imagine that if 'x == x' started returning True for NaNs there might be some complaints. > P.S. Mark, those Py2.6 invariants are not still true in Py3.0: You're right, of course. |
|||
| msg75742 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 14:46 | |
With the lines Raymond mentions restored, all tests (including the extra
tests Christian's posted here) pass for me, and I also get:
>>> a = [float('nan')]
>>> a == a
True
Incidentally, it looks as though the PyObject_RichCompareBool lines were
removed in r51533.
|
|||
| msg75745 - (view) | Author: Michael B Curtis (mikecurtis) | Date: 2008年11月11日 15:59 | |
All, Thank you for your rigorous analysis of this bug. To answer the question of the impact of this bug: the real issue that caused problems for our application was Python deciding to silently cast NaN falues to 0L, as discussed here: http://mail.python.org/pipermail/python-dev/2008-January/075865.html This would cause us to erroneously recognize 0s in our dataset when our input was invalid, which caused various issues. Per that thread, it sounds like there is no intention to fix this for versions prior to 3.0, so I decided to detect NaN values early on with the following: def IsNan(x): return (x is x) and (x != x) This is not the most rigorous check, but since our inputs are expected to be restricted to N-dimensional lists of numeric and/or string values, this was sufficient for our purposes. However, I wanted to be clear as to what would happen if this were handed a vector or matrix containing a NaN, so I did a quick check, which led me to this bug. My workaround is to manually avoid the optimization, with the following code: def IsNan(x): if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set): for i in x: if IsNan(i): return True return False else: return (x is x) and (x != x) This isn't particularly pretty, but since our inputs are relatively constrained, and since this isn't performance-critical code, it suffices for our purposes. For anyone working with large datasets, this would be suboptimal. (As an aside, if someone has a better solution for a general-case NaN-checker, which I'm sure someone does, feel free to let me know what it is). Additionally, while I believe that it is most correct to say that a list containing NaN is not equal to itself, I would hesitate to claim that it is even what most applications would desire. I could easily imagine individuals who would only wish for the list to be considered NaN-like if all of its values are NaN. Of course, that wouldn't be solved by any changes that might be made here. Once one gets into that level of detail, I think the programmer needs to implement the check manually to guarantee any particular expected outcome. Returning to the matter at hand: while I cringe to know that there is this inconsistency in the language, as a realist I completely agree that it would be unreasonable to remove the optimization to preserve this very odd corner case. For this reason, I proposed a minimal solution here to be that this oddity merely be documented better. Thanks again for your thoughts. |
|||
| msg75747 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 16:44 | |
[Michael]
> the real issue that caused problems [...] was Python deciding to
> silently cast NaN falues to 0L
> [...]
> it sounds like there is no intention to fix this for versions prior
> to 3.0,
Oh, <rude words> <rude words> <more rude words>! Guido's message does
indeed say that that behaviour shouldn't be changed before 3.0. And
if I'd managed to notice his message, I wouldn't have 'fixed' it for
2.6.
Python 2.7a0 (trunk:67115, Nov 6 2008, 08:37:21)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> int(float('nan'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
:-(.
[Imagine me looking extreeemely sheepish.]
I guess I owe apologies to Christian and Guido here. Sorry, folks. Is
there any way I can make amends?
|
|||
| msg75749 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 16:54 | |
The issue with nan -> 0L was fixed in Python 2.6. I can't recall if I
fixed it or somebody else but I know it was discussed.
$ python2.6
>>> long(float("nan"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
>>> long(float("inf"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer
|
|||
| msg75756 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年11月11日 19:33 | |
Mark, thanks for checking that all the tests still pass. Please do add back the missing lines in PyObject_RichCompareBool(). It seems that their removal was not a necessary part of eliminating default sort-ordering. |
|||
| msg75759 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 20:45 | |
(Comment for Michael, everybody else can safely ignore it. Instead of writing: if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set): you can write: if isinstance(x, (list, tuple, set)): or even better: if hasattr(x, "__iter__"): Starting with Python 2.6 you can use "from math import isnan", too. ) |
|||
| msg75761 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月11日 21:34 | |
Here's a patch incorporating Christian's tests, the missing lines from PyObject_RichCompareBool, and some additional tests to check that [x] == [x] and the like remain True even when x is a NaN. Oh, and a Misc/NEWS entry. With this patch, all tests pass (debug and non-debug 32-bit builds) on OS X 10.5/Intel. I think we should get the okay from Guido before committing this. Maybe he remembers why he deleted these lines in the first place. :-) |
|||
| msg75763 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月11日 22:01 | |
The tests are passing on Ubuntu Linux 64bit, too. Good work everybody! |
|||
| msg75773 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年11月12日 01:22 | |
Mark, the patch looks good. Thanks. Before committing, please add one
other test that verifies the relationship between "in" in membership
tests and "in" in a for-loop:
for constructor in list, tuple, dict.fromkeys, set, collections.deque:
container = constructor([float('nan'), 1, None, 'abc'])
for elem in container :
self.assert_(elem in container)
|
|||
| msg75790 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月12日 17:58 | |
> Before committing, please add one > other test that verifies the relationship between "in" in membership > tests and "in" in a for-loop: Added. (To test_contains, since this seems like a more appropriate place than test_float.) |
|||
| msg75801 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年11月12日 21:46 | |
Reviewed the updated patch and all is well. Thanks. |
|||
| msg75802 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年11月12日 21:53 | |
Mark, would you do the honor, please? |
|||
| msg75808 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年11月12日 23:26 | |
Committed, r67204. Thanks guys, and thanks especially to Michael for spotting this before 3.0 final. |
|||
| msg139291 - (view) | Author: Sven Marnach (smarnach) | Date: 2011年06月27日 15:00 | |
The behaviour discussed in this thread does not seem to be reflected in Python's documentation. The documentation of __eq__() [1] doesn't mention that objects should compare equal to themselves. [1]: http://docs.python.org/dev/reference/datamodel.html#object.__eq__ There are several places in the documentation that are wrong for NaNs; just one example is the documentation of sequence types [2], which states: > This means that to compare equal, every element must compare equal > and the two sequences must be of the same type and have the same > length. [2]: http://docs.python.org/dev/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range It's probably not worthwhile to "fix" all the places in the documentation that implicitly assume that objects compare equal to themselves, but it probably is a good idea to mention that __eq__() implementations should fulfil this assumption to avoid strange behaviour when used in combination with standard containers. Any thoughts? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:41 | admin | set | github: 48546 |
| 2011年06月27日 15:00:51 | smarnach | set | nosy:
+ smarnach messages: + msg139291 |
| 2011年04月28日 19:33:26 | terry.reedy | set | resolution: fixed |
| 2008年11月12日 23:26:19 | mark.dickinson | set | status: open -> closed messages: + msg75808 |
| 2008年11月12日 21:53:16 | christian.heimes | set | messages: + msg75802 |
| 2008年11月12日 21:46:57 | rhettinger | set | messages: + msg75801 |
| 2008年11月12日 17:58:46 | mark.dickinson | set | files:
+ issue4296.patch messages: + msg75790 |
| 2008年11月12日 17:57:36 | mark.dickinson | set | files: - issue4296.patch |
| 2008年11月12日 01:22:06 | rhettinger | set | messages: + msg75773 |
| 2008年11月11日 22:01:07 | christian.heimes | set | assignee: gvanrossum -> barry messages: + msg75763 stage: test needed -> patch review |
| 2008年11月11日 21:34:26 | mark.dickinson | set | files:
+ issue4296.patch assignee: gvanrossum messages: + msg75761 |
| 2008年11月11日 20:45:41 | christian.heimes | set | messages: + msg75759 |
| 2008年11月11日 19:33:49 | rhettinger | set | messages: + msg75756 |
| 2008年11月11日 16:54:00 | christian.heimes | set | messages: + msg75749 |
| 2008年11月11日 16:44:53 | mark.dickinson | set | messages: + msg75747 |
| 2008年11月11日 15:59:28 | mikecurtis | set | messages: + msg75745 |
| 2008年11月11日 14:46:21 | mark.dickinson | set | messages: + msg75742 |
| 2008年11月11日 13:58:02 | mark.dickinson | set | messages: + msg75740 |
| 2008年11月11日 13:10:49 | rhettinger | set | messages: + msg75739 |
| 2008年11月11日 12:49:33 | christian.heimes | set | files:
+ issue4296_test.patch keywords: + patch messages: + msg75738 |
| 2008年11月11日 12:21:42 | mark.dickinson | set | messages: + msg75737 |
| 2008年11月11日 12:12:21 | christian.heimes | set | priority: normal -> release blocker nosy: + gvanrossum, barry messages: + msg75736 |
| 2008年11月11日 12:03:39 | rhettinger | set | nosy:
+ rhettinger messages: + msg75735 versions: + Python 3.0, - Python 2.6, Python 2.5, Python 2.4, Python 2.7 |
| 2008年11月11日 10:57:41 | mark.dickinson | set | messages: + msg75734 |
| 2008年11月11日 10:52:50 | mark.dickinson | set | messages: + msg75733 |
| 2008年11月11日 01:49:50 | christian.heimes | set | priority: normal nosy: + mark.dickinson, christian.heimes stage: test needed messages: + msg75717 versions: + Python 2.6, Python 2.5, Python 2.7 |
| 2008年11月11日 01:44:41 | mikecurtis | create | |