[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 05:42:03 CEST 2011

On 2011年04月27日 22:16 , Guido van Rossum wrote:
> On Wed, Apr 27, 2011 at 11:48 AM, Robert Kern<robert.kern at gmail.com> wrote:
>> On 4/27/11 12:44 PM, Terry Reedy wrote:
>>>>>> On 4/27/2011 10:53 AM, Guido van Rossum wrote:
>>>>>> Maybe we should just call off the odd NaN comparison behavior?
>>>>>> Eiffel seems to have survived, though I do not know if it used for
>>> numerical
>>> work. I wonder how much code would break and what the scipy folks would
>>> think.
>>>> I suspect most of us would oppose changing it on general
>> backwards-compatibility grounds rather than actually *liking* the current
>> behavior. If the behavior changed with Python floats, we'd have to mull over
>> whether we try to match that behavior with our scalar types (one of which
>> subclasses from float) and our arrays. We would be either incompatible with
>> Python or C, and we'd probably end up choosing Python to diverge from. It
>> would make a mess, honestly. We already have to explain why equality is
>> funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not
>> a bool, so we can't do containment tests for lists of arrays), so NaN is
>> pretty easy to explain afterward.
>> So does NumPy also follow Python's behavior about ignoring the NaN
> special-casing when doing array ops?

By "ignoring the NaN special-casing", do you mean that identity is checked 
first? When we use dtype=object arrays (arrays that contain Python objects as 
their data), yes:
[~]
|1> nan = float('nan')

[~]
|2> import numpy as np

[~]
|3> a = np.array([1, 2, nan], dtype=object)

[~]
|4> nan in a
True
[~]
|5> float('nan') in a
False
Just like lists:
[~]
|6> nan in [1, 2, nan]
True
[~]
|7> float('nan') in [1, 2, nan]
False
Actually, we go a little further by using PyObject_RichCompareBool() rather than 
PyObject_RichCompare() to implement the array-wise comparisons in addition to 
containment:
[~]
|8> a == nan
array([False, False, True], dtype=bool)
[~]
|9> [x == nan for x in [1, 2, nan]]
[False, False, False]
But for dtype=float arrays (which contain C doubles, not Python objects) we use 
C semantics. Literally, we use whatever C's == operator gives us for the two 
double values. Since there is no concept of identity for this case, there is no 
cognate behavior of Python to match.
[~]
|10> b = np.array([1.0, 2.0, nan], dtype=float)

[~]
|11> b == nan
array([False, False, False], dtype=bool)
[~]
|12> nan in b
False
-- 
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
 -- Umberto Eco