Cache-conscious data structure that does not require hashing

Question 1

I am looking for a cache conscious data structure that does not require hashing. This is to avoid HashDoS without needing to use cryptographic PRFs such as SipHash, which are slow (~1 cycle/byte) – I think that one can do better.

So far, I have found HAT-tries, radix tries, and Judy arrays.

HAT-tries still require hashing, so the problem arises again.
Radix tries are still slower than hash tables due to poor utilization of CPU caches.
Judy arrays are fast, but are enormously complex (20k SLOC in C), and the only implementation that I know only accepts strings, byte arrays, and machine words as keys. Furthermore, their performance characteristics are non-portable: the same library that performs excellently on systems with 64-byte cache lines may have much worse performance on systems with 32-byte cache lines.

None of these solve the problem. Simplicity is important, as I will be most likely to use this data structure in languages other than C or Java and so need to be able to implement it in a new language in a reasonable amount of time.

You can assume that a source of cryptographically secure random numbers with reasonable performance is available.

Only data structures that are unencumbered by patents are of interest.

The keys and values need to be able to be treated as opaque (except perhaps for size): all access needs to be (or be able to be) through user-defined accessors.

Question 2

You haven't said much about your specific application of the data structure. Will ordinary memoization solve the problem?

Question 3

You stated that Judy arrays "only accept strings, byte arrays, and machine words as keys." What other key types will you require?

Question 4

@RobertHarvey The application is for a general-purpose Dictionary, as might be provided in a language's standard library. So it needs to be able to handle arbitrary key types, with a small amount of code (that should be able to be generated automatically) for each key type.

Question 5

Hashing an entire, arbitrary object seems... excessive. That's going to be slow no matter how you implement it. Could your objects be stamped with a GUID, and that GUID used as the key to the cache?

Question 6

@RobertHarvey How the object gets hashed/compared/etc is the user's choice. The data structure must be able to leave this choice to the user, with well-defined requirements on the choice. In other words, the data structure needs to treat the keys and values as opaque (except for size); all interaction needs to go through user-provided functions.

Question 7

Functors may be an acceptable structure:

In fact, a polytypic function is uniquely defined by its action on projection functors and on primitive functors such as sums and products. This information is sufficient to specialize a polytypic function to arbitrary datatypes, including mutually recursive datatypes and nested datatypes

EasyCrypt is an example:

EasyCrypt

In our work, we have used polytypism [11] to implement datatype encryption: elements of datatypes are reduced by polytypic encoders to lists of bits which are then encrypted by a mode of operation instantiated with a particular block cipher. The correctness proofs of block ciphers can be combined with the correctness of encoders to obtain the correctness of data encryption

Encoding functions can be automatically defined when a new datatype is declared; the interpretation is used to calculate the form of the encoder from the declaration of the type. Mutually recursive datatypes and datatypes with recursion under existing type operators (so-called nested datatypes) are cleanly handled. Encoding and decoding of polymorphic types is dealt with by abstraction: an encoder for a polymorphic type is parameterized by encoders for types that may be substituted for the type variables.

References

score -1 · Answer 1 · 2018-09-21 14:52:49Z

Functors may be an acceptable structure:

In fact, a polytypic function is uniquely defined by its action on projection functors and on primitive functors such as sums and products. This information is sufficient to specialize a polytypic function to arbitrary datatypes, including mutually recursive datatypes and nested datatypes

EasyCrypt is an example:

EasyCrypt

In our work, we have used polytypism [11] to implement datatype encryption: elements of datatypes are reduced by polytypic encoders to lists of bits which are then encrypted by a mode of operation instantiated with a particular block cipher. The correctness proofs of block ciphers can be combined with the correctness of encoders to obtain the correctness of data encryption

Encoding functions can be automatically defined when a new datatype is declared; the interpretation is used to calculate the form of the encoder from the declaration of the type. Mutually recursive datatypes and datatypes with recursion under existing type operators (so-called nested datatypes) are cleanly handled. Encoding and decoding of polymorphic types is dealt with by abstraction: an encoder for a polymorphic type is parameterized by encoders for types that may be substituted for the type variables.

References

Stack Exchange Network

Cache-conscious data structure that does not require hashing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Cache-conscious data structure that does not require hashing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions