In a project of mine, I want to represent states as efficiently as possible. Initially, I went ahead with a BitBoard representation and some clever tricks to speed up the evaluation of the game tree. That works fine. Though when looking up any given state in the Look-Up table, I need to do some heavy calculations.
After some research, I found a solution that seems to solve my issues: Zobrist Hashing. My game state comprises about ~370 possible fields to fill - depending on the encoding. After populating all fields of my array with randomly generated numbers (I used the default Random class from java.lang) I tested XORing fields together. Everything works smoothly.
But I got two unanswered (which are closely entangled) questions left:
1. If I save the states in a look-up database, I should save the initial randomly generated array in order to reproduce the same look-up key each time. If I don't do this, I would need to save other data (such as which fields are XORed together) in order to check the look-up table - but this would render the whole idea of Zobrist hashing (or hashing in general) useless.
2. I can't verify if the approach yields any collisions UNLESS (once again) I save other data - such as which fields were XORed together to get the Zobrist hash.
I other posts on how to detect collisions, how the popular Stockfish engine approaches the issue, the mathematical aspects and much more, but wasn't able to find a summarizing answer:
In general, using Zobrist hashing, how and which data should one save in order to cross-verify the data in a look-up table? Should I first try to check a few billion rounds for collisions and re-generate random numbers once I found too many over a given treshhold? Should I just generate and save the Zobrist numbers to load up every run of the programme - hoping no collisions appear?
initZobrist()->validateZobrist()(usually not done - why?) ->saveZobristNumbers()->hash(). Randomly generating with init() each and every time seems counter-intuitive - unless you save configurations of the board along with it to validate the board (which would make the hashing useless imo). Furthermore, why does no code snippet I looked at preserve the Zobrist array for future access? Why do many implementations save additional information? $\endgroup$