Planning a Python Course for Beginners

Thu Aug 10 05:00:54 EDT 2017

Steven D'Aprano wrote:
> On 2017年8月09日 20:07:48 +0300, Marko Rauhamaa wrote:
>>> Good point! A very good __hash__() implementation is:
>>>> def __hash__(self):
>> return id(self)
>>>> In fact, I didn't know Python (kinda) did this by default already. I
>> can't find that information in the definition of object.__hash__():
>>> Hmmm... using id() as the hash would be a terrible hash function. Objects

It's actually id(self) >> 4 (almost, see C code below), to account for 
memory alignment.
>>> obj = object()
>>> hex(id(obj))
'0x7f1f058070b0'
>>> hex(hash(obj))
'0x7f1f058070b'
>>> sample = (object() for _ in range(10))
>>> all(id(obj) >> 4 == hash(obj) for obj in sample)
True
> would fall into similar buckets if they were created at similar times,
> regardless of their value, rather than being well distributed. 

If that were the problem it wouldn't be solved by the current approach:
>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[1, 1, 1, 1, 1, 1, 1, 1, 1]
Py_hash_t
_Py_HashPointer(void *p)
{
 Py_hash_t x;
 size_t y = (size_t)p;
 /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
 excessive hash collisions for dicts and sets */
 y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
 x = (Py_hash_t)y;
 if (x == -1)
 x = -2;
 return x;
}