Compile-time string hash

Question 1

A while back this question proposed a constexpr compile-time Sieve of Eratosthenes. It also linked to another SO question about compile-time computation of CRCs. I've found a place where I'd like to precompute the hash of a few C-strings, so I used those previous questions as a base.

I'm implementing the one-at-a-time hash algorithm for C-style strings. I'm stuck to using C++11, so the function has to be recursive and without any local state:

namespace ct
{
constexpr std::uint32_t stringLength(const char * cstr)
{
 return (*cstr != '0円') ? (stringLength(cstr + 1) + 1) : 0;
}
constexpr std::uint32_t sumSHL(std::uint32_t h, std::uint32_t shift) { return h + (h << shift); }
constexpr std::uint32_t sumSHR(std::uint32_t h, std::uint32_t shift) { return h + (h >> shift); }
constexpr std::uint32_t xorSHR(std::uint32_t h, std::uint32_t shift) { return h ^ (h >> shift); }
constexpr std::uint32_t hashFinishImpl(std::uint32_t h)
{
 // h += (h << 3)
 // h ^= (h >> 11)
 // h += (h << 15)
 return sumSHL(xorSHR(sumSHL(h, 3), 11), 15);
}
constexpr std::uint32_t hashStepImpl(std::uint32_t h, std::uint32_t c)
{
 // h += c
 // h += (h << 10)
 // h ^= (h >> 6)
 return xorSHR(sumSHL(h + c, 10), 6);
}
constexpr std::uint32_t hashImpl(const char * cstr, std::uint32_t length, std::uint32_t h)
{
 return (length != 0) ? hashImpl(cstr + 1, length - 1, hashStepImpl(h, *cstr)) : hashFinishImpl(h);
}
constexpr std::uint32_t hashCString(const char * cstr)
{
 return hashImpl(cstr, stringLength(cstr), 0);
}
} // namespace ct {}

I'm thinking it looks a bit convoluted, but couldn't think of a better way of handling it... Any suggestions are appreciated.

Question 2

Your code looks very clean to me and I already like it very much. A few minor remarks.

Assuming that you'll probably want to provide this functionality in a header file, you should declare all your functions as inline. Even if they're not in a header file, it doesn't hurt to do so.
Consider declaring your functions as noexcept, too.
I'd prefer to see the implementation details hidden in a "detail" namespace to make it clear that only hashCString is supposed to be used directly. (Or is this assumption incorrect?)
ct::hashCString is not the most descriptive name. I'd prefer if the name somehow indicated the implemented hash algorithm.
You don't actually need the length of the string. Just replace length != 0 with *cstr != '0円' in hashImpl and figure out the length as you go. This will save you one function.
It seems to me that "plus" or "add" would be more consistent with "xor" than "sum" as the emphasis is on the operation.
Your code assumes that char is an 8 bit integer. It would also be nice if you could hash arrays of signed and unsigned chars alike. Something like the following might do.
```
template <typename CharT>
constexpr typename std::enable_if<std::is_integral<CharT>::value && (CHAR_BIT * sizeof(CharT) == 8), std::uint8_t>::type
hashCString(const CharT *const s) noexcept;
```
Instead of requiring CHAR_BIT * sizeof(CharT) == 8, you could relax it to (CHAR_BIT * sizeof(CharT)) % 8 == 0 and manually loop over each octet in wider types. Smells like overkill, though.
Another possible generalization would be to provide the classical interface that takes a pair of "begin" and "end" iterators. This might be handy for hashing sub-strings, strings with embedded zeros or strings that are not terminated with a 0 byte. You can still provide your single-argument version as a convenience overload.

5gon12eder 5gon12eder 4,30014 silver badges29 bronze badges · Accepted Answer · 2016-03-30 22:22:25Z

Your code looks very clean to me and I already like it very much. A few minor remarks.

Assuming that you'll probably want to provide this functionality in a header file, you should declare all your functions as inline. Even if they're not in a header file, it doesn't hurt to do so.
Consider declaring your functions as noexcept, too.
I'd prefer to see the implementation details hidden in a "detail" namespace to make it clear that only hashCString is supposed to be used directly. (Or is this assumption incorrect?)
ct::hashCString is not the most descriptive name. I'd prefer if the name somehow indicated the implemented hash algorithm.
You don't actually need the length of the string. Just replace length != 0 with *cstr != '0円' in hashImpl and figure out the length as you go. This will save you one function.
It seems to me that "plus" or "add" would be more consistent with "xor" than "sum" as the emphasis is on the operation.
Your code assumes that char is an 8 bit integer. It would also be nice if you could hash arrays of signed and unsigned chars alike. Something like the following might do.
```
template <typename CharT>
constexpr typename std::enable_if<std::is_integral<CharT>::value && (CHAR_BIT * sizeof(CharT) == 8), std::uint8_t>::type
hashCString(const CharT *const s) noexcept;
```
Instead of requiring CHAR_BIT * sizeof(CharT) == 8, you could relax it to (CHAR_BIT * sizeof(CharT)) % 8 == 0 and manually loop over each octet in wider types. Smells like overkill, though.
Another possible generalization would be to provide the classical interface that takes a pair of "begin" and "end" iterators. This might be handy for hashing sub-strings, strings with embedded zeros or strings that are not terminated with a 0 byte. You can still provide your single-argument version as a convenience overload.

Stack Exchange Network

Compile-time string hash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Compile-time string hash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions