I am looking for a cache conscious data structure that does not require hashing. This is to avoid HashDoS without needing to use cryptographic PRFs such as SipHash, which are slow (~1 cycle/byte) – I think that one can do better.
So far, I have found HAT-tries, radix tries, and Judy arrays.
- HAT-tries still require hashing, so the problem arises again.
- Radix tries are still slower than hash tables due to poor utilization of CPU caches.
- Judy arrays are fast, but are enormously complex (20k SLOC in C), and the only implementation that I know only accepts strings, byte arrays, and machine words as keys. Furthermore, their performance characteristics are non-portable: the same library that performs excellently on systems with 64-byte cache lines may have much worse performance on systems with 32-byte cache lines.
None of these solve the problem. Simplicity is important, as I will be most likely to use this data structure in languages other than C or Java and so need to be able to implement it in a new language in a reasonable amount of time.
You can assume that a source of cryptographically secure random numbers with reasonable performance is available.
Only data structures that are unencumbered by patents are of interest.
The keys and values need to be able to be treated as opaque (except perhaps for size): all access needs to be (or be able to be) through user-defined accessors.
1 Answer 1
Functors may be an acceptable structure:
In fact, a polytypic function is uniquely defined by its action on projection functors and on primitive functors such as sums and products. This information is sufficient to specialize a polytypic function to arbitrary datatypes, including mutually recursive datatypes and nested datatypes
EasyCrypt is an example:
In our work, we have used polytypism [11] to implement datatype encryption: elements of datatypes are reduced by polytypic encoders to lists of bits which are then encrypted by a mode of operation instantiated with a particular block cipher. The correctness proofs of block ciphers can be combined with the correctness of encoders to obtain the correctness of data encryption
Encoding functions can be automatically defined when a new datatype is declared; the interpretation is used to calculate the form of the encoder from the declaration of the type. Mutually recursive datatypes and datatypes with recursion under existing type operators (so-called nested datatypes) are cleanly handled. Encoding and decoding of polymorphic types is dealt with by abstraction: an encoder for a polymorphic type is parameterized by encoders for types that may be substituted for the type variables.
References
Dictionary
, as might be provided in a language's standard library. So it needs to be able to handle arbitrary key types, with a small amount of code (that should be able to be generated automatically) for each key type.