0

I'm building some Minimal Perfect Hash from multiple datasets using the CHD algorithm

I would like to know if there exists a "smart" way to combine all the different MPH obtained from previous steps and use them to build a new one at minimal cost.

asked Jun 16, 2020 at 15:21

1 Answer 1

1

With the MPHs of all your data types, you have fixed sets of consecutive integers to work with for which you know the maximum value.

Lets say you're hashing the combination of three data types A, B, and C. For simplicity's sake, lets say that A has 14 hashes (0 to 13), B has 30, and C has 220.

Here's some example code of how to create minimal perfect hashes from minimal perfect sub hashes and how to parse those sub hashes from a the merged hash:

public static void main(String[] args) {
 // the number of possible hashes for each data type
 int[] numHashes = { 14, 30, 220 };
 // the hashes of each data to merge
 int[] hashes = { 4, 12, 178};
 // the bases for each data type, based on the number of possible hashes for each merged type
 int[] bases = getBases(numHashes);
 System.out.println("bases: " + Arrays.toString(bases));
 int hash = mergeHashes(hashes, bases);
 System.out.println("merged hash: " + hash);
 int[] split = splitHashes(hash, bases);
 System.out.println("parsed sub hashes: " + Arrays.toString(split));
}
/** Determine the bases for each of the sub hashes, based on the max hashes of the preceding sub hashes */
public static int[] getBases(int[] numHashes) {
 int[] bases = new int[numHashes.length];
 bases[0] = 1;
 for (int i = 0; i < numHashes.length - 1; i++) {
 bases[i + 1] = bases[i] * numHashes[i];
 }
 return bases;
}
/** merge the sub hashes, using their bases to shift their values */
public static int mergeHashes(int[] hashes, int[] bases) {
 int hash = 0;
 for (int i = 0; i < hashes.length; i++) {
 hash = Math.addExact(hash, Math.multiplyExact(hashes[i], bases[i]));
 }
 return hash;
}
/** split a hash back out into its sub hashes, using the bases to do so */
public static int[] splitHashes(int hash, int[] bases) {
 int[] hashes = new int[bases.length];
 int i = bases.length - 1;
 hashes[i] = hash / bases[i];
 for (i--; i >= 0; i--) {
 hash -= (hashes[i + 1] * bases[i + 1]);
 hashes[i] = hash / bases[i];
 }
 return hashes;
}

(削除) As hashes can be stored in 4 bits, Bs in 5, and Cs in 8. That's a total of 17 bits to store all of these hashes; That fits inside a 4 byte integer! So a simple way to handle this would be to just concatenate their bits to have a unique integer for each combination of hashes, like so:

 spare bits C B A
[00000000 0000000][0 0000000][0 0000][0000]

It's not exactly minimal though.. since there are gaps. For example, the 4 bits of As hash can represent 2 more hashes than needed, up to 15 rather than just 13, meaning that your actual hashes would jump from 13 to 16, skipping 14 and 15. Perhaps there's a way to smoosh these together more so that there are no gaps??

Note that if the sum of bits required by your separate minimal perfect hashes is more than can be fit in an integer then you're out of luck because an integer simply wont be able to store the number of possible combinations of your data types. (削除ここまで)

answered Jun 16, 2020 at 16:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.