Context
Say I have a transposition table that uses keys produced by (e.g. Zobrist) hashing game positions.
The table has a finite recycled memory (key % p is the index of the key, p is table size and prime).
That is, at table[index(key)] (there) we store key and (game position) evaluation.
So, when fetching cached evaluation of a game position P that was hashed into key:
- If
keymatches with key stored atindex(key), we returnevaluationstored there. - Else, we return default value (indicating "empty"), then rewrite
keyandevaluationthere.
(I've basically summarized how a simple transposition table can be implemented.)
Collision problem
The problem is when two different game positions P1 and P2 are hashed into the same key (collision).
Say that we encounter position P1 first. First time reading this key, "empty" value will be returned and evaluation1 for P1 will be stored in table[index(key)]. Second time reading this key (matching key at index), evaluation1 for P1 will be returned. This will either:
Be fine, in the case that we encountered
P1again.Be an error in the game evaluation, in the case that we encountered
P2.
How can this error be detected and corrected, efficiently?
Apparently Stockfish uses something like this Zobrist hashing and has verification "Every entry is carefully validated before use,...", meaning that "...this (game engine) would work even if every hash table access would return random numbers."; But, how is this verification implemented?
-
$\begingroup$ You don't correct zobrist hash collisions.In this paper, Robert Hyatt(creator of Crafty Engine) ran a test where he forced the transposition table to have collisions. He and Anthony Cozzie(creator of Zappa) concluded that the cost of a collision here and there will not significantly change the evaluation of the position, and 64 bit keys are enough. Also, you cannot detect zobrist hash collisions. It is impossible to go from a zobrist hash back to a chess position, and storing the whole position in the transposition table takes too much memory. $\endgroup$AspectOfTheNoob– AspectOfTheNoob2023年03月14日 20:51:53 +00:00Commented Mar 14, 2023 at 20:51
1 Answer 1
I don't have any particular experience with transposition tables in particular, but in general, the standard way to use a hash table stores the key (e.g., the value P1 or P2) in the entry, not the hash of the key. This way you can detect collisions.
Terminology: in standard hash table usage, the key is the value (i.e., the position, such as P1 or P2), not its hash.
There are many standard ways to resolve collisions in hash tables, e.g., using open chaining or closed chaining. See https://en.wikipedia.org/wiki/Hash_table. However, I suspect those are not needed for transposition tables, as I suspect it is fine to return "unknown" in the event of a collision. If so, you might not need any of those methods.
-
$\begingroup$ I'm not concerned with that type of collisions (my hash function that compute the index is simple modulo p, prime size), i.e. I'm not concerned with the transposition table hashing. Instead, note that people combine transposition tables with additional (e.g. Zorbist) hashing of keys to save space (smaller keys). Then they usually try to minimize collisions of hashed keys in various ways, as they will inevitably happen. Instead of minimizing collisions, I'm wondering if there's a way to efficiently detect and correct those collisions, maybe using properties of Zorbist or some other hash. $\endgroup$Vepir– Vepir2021年06月18日 20:01:30 +00:00Commented Jun 18, 2021 at 20:01
-
$\begingroup$ You’d often store the hashed value to have a very fast check that you found the right value, but for chess positions you could have a hash value that is actually a subset of the position. $\endgroup$gnasher729– gnasher7292022年07月14日 07:13:03 +00:00Commented Jul 14, 2022 at 7:13
-
$\begingroup$ For chess positions, you could create a 64 or 96 bit hash values. With 96 bits, you’d have to lookup 2^48 positions on average until you get the wrong result, and then it is most likely that one incorrect lookup doesn’t affect the result. Having a bigger hashtable, saving time and calculating further, may very well outweigh the risk of very rare non-optimal results. $\endgroup$gnasher729– gnasher7292022年07月14日 07:21:29 +00:00Commented Jul 14, 2022 at 7:21
Explore related questions
See similar questions with these tags.