Detecting and correcting collisions in (Zobrist) hashing to avoid errors in transposition table

Question 1

Context

Say I have a transposition table that uses keys produced by (e.g. Zobrist) hashing game positions.

The table has a finite recycled memory (key % p is the index of the key, p is table size and prime).

That is, at table[index(key)] (there) we store key and (game position) evaluation.

So, when fetching cached evaluation of a game position P that was hashed into key:

If key matches with key stored at index(key), we return evaluation stored there.
Else, we return default value (indicating "empty"), then rewrite key and evaluation there.

(I've basically summarized how a simple transposition table can be implemented.)

Collision problem

The problem is when two different game positions P1 and P2 are hashed into the same key (collision).

Say that we encounter position P1 first. First time reading this key, "empty" value will be returned and evaluation1 for P1 will be stored in table[index(key)]. Second time reading this key (matching key at index), evaluation1 for P1 will be returned. This will either:

Be fine, in the case that we encountered P1 again.
Be an error in the game evaluation, in the case that we encountered P2.

How can this error be detected and corrected, efficiently?

Apparently Stockfish uses something like this Zobrist hashing and has verification "Every entry is carefully validated before use,...", meaning that "...this (game engine) would work even if every hash table access would return random numbers."; But, how is this verification implemented?

Question 2

You don't correct zobrist hash collisions.In this paper, Robert Hyatt(creator of Crafty Engine) ran a test where he forced the transposition table to have collisions. He and Anthony Cozzie(creator of Zappa) concluded that the cost of a collision here and there will not significantly change the evaluation of the position, and 64 bit keys are enough. Also, you cannot detect zobrist hash collisions. It is impossible to go from a zobrist hash back to a chess position, and storing the whole position in the transposition table takes too much memory.

Question 3

I don't have any particular experience with transposition tables in particular, but in general, the standard way to use a hash table stores the key (e.g., the value P1 or P2) in the entry, not the hash of the key. This way you can detect collisions.

Terminology: in standard hash table usage, the key is the value (i.e., the position, such as P1 or P2), not its hash.

There are many standard ways to resolve collisions in hash tables, e.g., using open chaining or closed chaining. See https://en.wikipedia.org/wiki/Hash_table. However, I suspect those are not needed for transposition tables, as I suspect it is fine to return "unknown" in the event of a collision. If so, you might not need any of those methods.

Question 4

I'm not concerned with that type of collisions (my hash function that compute the index is simple modulo p, prime size), i.e. I'm not concerned with the transposition table hashing. Instead, note that people combine transposition tables with additional (e.g. Zorbist) hashing of keys to save space (smaller keys). Then they usually try to minimize collisions of hashed keys in various ways, as they will inevitably happen. Instead of minimizing collisions, I'm wondering if there's a way to efficiently detect and correct those collisions, maybe using properties of Zorbist or some other hash.

Question 5

You’d often store the hashed value to have a very fast check that you found the right value, but for chess positions you could have a hash value that is actually a subset of the position.

Question 6

For chess positions, you could create a 64 or 96 bit hash values. With 96 bits, you’d have to lookup 2^48 positions on average until you get the wrong result, and then it is most likely that one incorrect lookup doesn’t affect the result. Having a bigger hashtable, saving time and calculating further, may very well outweigh the risk of very rare non-optimal results.

D.W. ♦ 169k23 gold badges236 silver badges519 bronze badges · Accepted Answer · 2021-06-18 19:34:18Z

1

$\begingroup$

I don't have any particular experience with transposition tables in particular, but in general, the standard way to use a hash table stores the key (e.g., the value P1 or P2) in the entry, not the hash of the key. This way you can detect collisions.

Terminology: in standard hash table usage, the key is the value (i.e., the position, such as P1 or P2), not its hash.

There are many standard ways to resolve collisions in hash tables, e.g., using open chaining or closed chaining. See https://en.wikipedia.org/wiki/Hash_table. However, I suspect those are not needed for transposition tables, as I suspect it is fine to return "unknown" in the event of a collision. If so, you might not need any of those methods.

Share

Improve this answer

answered Jun 18, 2021 at 19:34

D.W.'s user avatar

D.W. ♦

169k23 gold badges236 silver badges519 bronze badges

$\endgroup$

3

$\begingroup$ I'm not concerned with that type of collisions (my hash function that compute the index is simple modulo p, prime size), i.e. I'm not concerned with the transposition table hashing. Instead, note that people combine transposition tables with additional (e.g. Zorbist) hashing of keys to save space (smaller keys). Then they usually try to minimize collisions of hashed keys in various ways, as they will inevitably happen. Instead of minimizing collisions, I'm wondering if there's a way to efficiently detect and correct those collisions, maybe using properties of Zorbist or some other hash. $\endgroup$

Vepir
– Vepir

2021年06月18日 20:01:30 +00:00
Commented Jun 18, 2021 at 20:01
$\begingroup$ You’d often store the hashed value to have a very fast check that you found the right value, but for chess positions you could have a hash value that is actually a subset of the position. $\endgroup$

gnasher729
– gnasher729

2022年07月14日 07:13:03 +00:00
Commented Jul 14, 2022 at 7:13
$\begingroup$ For chess positions, you could create a 64 or 96 bit hash values. With 96 bits, you’d have to lookup 2^48 positions on average until you get the wrong result, and then it is most likely that one incorrect lookup doesn’t affect the result. Having a bigger hashtable, saving time and calculating further, may very well outweigh the risk of very rare non-optimal results. $\endgroup$

gnasher729
– gnasher729

2022年07月14日 07:21:29 +00:00
Commented Jul 14, 2022 at 7:21

Add a comment |

Stack Exchange Network

Detecting and correcting collisions in (Zobrist) hashing to avoid errors in transposition table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Detecting and correcting collisions in (Zobrist) hashing to avoid errors in transposition table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions