I'm currently trying to prove a data structure which I've created to speed up lookups for integral keys. The specific case I have in mind has the following usage case:
- At start is populated with n number of items (where n can be from 1 to thousands), and the specific key is actually an
unsigned int
. - The critical path in the code requires a lookup based on this key.
I've tried the following:
- sorted
vector
with binary search std::map
boost::unordered_map
And now this tree. I've found through profiling that the following holds
- If every item looked up exists, then
unordered_set
is quicker thanstd::map
which is quicker than sorted vector - If the number of successful lookups for the search set is low, then
std::map
is quicker thanunordered_set
(I guess the hash overhead is higher for failure case)
With this tree, I've found that it's quicker in all cases, but what I would like is some critique on the approach - are there any further optimisations that I am missing for example, and here is the tree:
/** Copyright: Nim 2011 */
template <typename KeyType, typename ValueType>
class tree
{
public:
typedef ValueType value_type;
typedef ValueType& reference;
typedef ValueType const& const_reference;
typedef KeyType key_type;
typedef ValueType* iterator;
typedef ValueType const* const_iterator;
private:
static const size_t MAX_SIZE = 256;
typedef unsigned char index_type;
struct node
{
node() : _cKey(), _cValue(), _vNodes() {}
~node()
{
cleanup();
}
void add(key_type index, key_type key, const_reference value)
{
if (!index)
{
_cKey = key;
_cValue.reset(new value_type(value));
return;
}
// get the lsb
index_type lsb = index & 0xff;
if (!_vNodes)
{
_vNodes = new node*[MAX_SIZE]();
_vNodes[lsb] = new node();
}
else if (!_vNodes[lsb])
_vNodes[lsb] = new node();
_vNodes[lsb]->add(index >> 8, key, value);
}
iterator find(key_type index, key_type key)
{
if (_cKey == key)
return _cValue.get();
if (_vNodes)
{
// get the lsb
index_type lsb = index & 0xff;
if (_vNodes[lsb])
return _vNodes[lsb]->find(index >> 8, key);
}
return 0;
}
const_iterator find(key_type index, key_type key) const
{
if (_cKey == key)
return _cValue.get();
if (_vNodes)
{
// get the lsb
index_type lsb = index & 0xff;
if (_vNodes[lsb])
return _vNodes[lsb]->find(index >> 8, key);
}
return 0;
}
void optimize()
{
if (_vNodes)
{
// first see how many nodes we have
size_t count = 0;
for(size_t i = 0; i < MAX_SIZE; ++i)
count += (_vNodes[i] != 0);
switch(count)
{
case 0: return; // nothing to optimize
case 1: pullup(); break;// we could pull up
default:
{
// means we have more than one node at this level, so optimize each
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
_vNodes[i]->optimize();
}
}
}
}
void pullup()
{
// try to pullup a leaf node
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
{
// if we manage to pullup, then cleanup the nodeset at this level
if (_vNodes[i]->pullup(_cKey, _cValue))
cleanup();
break;
}
}
bool pullup(key_type& key, boost::shared_ptr<value_type>& value)
{
if (!_vNodes)
{
// this is a leaf node -
key = _cKey;
value = _cValue;
return true;
}
// cannot pull up at this level
// see whether we can pull up from levels below
size_t count = 0;
for(size_t i = 0; i < MAX_SIZE; ++i)
count += (_vNodes[i] != 0);
switch(count)
{
case 0:
{
// this is a leaf node -
key = _cKey;
value = _cValue;
return true;
}
case 1:
{
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
// if we manage to pullup, then cleanup the nodeset at this level
return _vNodes[i]->pullup(key, value);
break;
}
default:
{
// means we have more than one node at this level so cannot pullup
}
}
return false;
}
void print(size_t index, std::ostream& str)
{
str << "<node i='" << index << '\'';
if (_cValue)
str << " key='" << _cKey << "' value='" << *_cValue << "'";
str << '>' << std::endl;
if (_vNodes != 0)
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
_vNodes[i]->print(i, str);
str << "</node>";
}
void cleanup()
{
if (_vNodes)
{
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
delete _vNodes[i];
delete[] _vNodes;
_vNodes = 0;
}
}
key_type _cKey;
boost::shared_ptr<value_type> _cValue;
node** _vNodes;
};
public:
tree() : _vNodes() {}
~tree()
{
if (_vNodes != 0)
{
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
delete _vNodes[i];
delete[] _vNodes;
_vNodes = 0;
}
}
iterator end() { return 0; }
const_iterator end() const { return 0; }
void add(key_type key, const_reference value)
{
// get the lsb
unsigned char lsb = key & 0xff;
if (!_vNodes)
{
_vNodes = new node*[MAX_SIZE]();
_vNodes[lsb] = new node();
}
else if (!_vNodes[lsb])
_vNodes[lsb] = new node();
_vNodes[lsb]->add(key >> 8, key, value);
}
iterator find(key_type key)
{
if (_vNodes)
{
// get the lsb
index_type lsb = key & 0xff;
if (_vNodes[lsb])
return _vNodes[lsb]->find(key >> 8, key);
}
return 0;
}
const_iterator find(key_type key) const
{
if (_vNodes)
{
// get the lsb
index_type lsb = key & 0xff;
if (_vNodes[lsb])
return _vNodes[lsb]->find(key >> 8, key);
}
return 0;
}
void optimize()
{
if (_vNodes )
// optmize each root node
for(size_t i = 0; i < MAX_SIZE; ++i)
if (_vNodes[i])
_vNodes[i]->optimize();
}
friend
std::ostream& operator<<(std::ostream& str, tree const& t)
{
str << "<?xml version='1.0'?>" << std::endl;
if (!t._vNodes)
return str << "<tree/>" << std::endl;
str << "<tree> " << std::endl;
for(size_t i = 0; i < MAX_SIZE; ++i)
if (t._vNodes[i])
t._vNodes[i]->print(i, str);
return str << "</tree> " << std::endl;
}
private:
node** _vNodes;
};
Some salient points:
- I don't need the container to be iterable - all I want is the fastest possible lookups
- I don't care about insertion performance as it's a start up cost
- I've added an
optimize
method to "prune" the tree and this works nicely, however the performance difference between a pruned and a normal tree is negligible. - Yes, I've considered
boost::scoped_array
. - I am considering using a
std::vector<node*>
rather thannode**
, I find the pointer syntax strangely appealing.
1 Answer 1
Just a few points (caveat: I'm not a C++ expert).
Comparing structured (i.e., non-scalar) keys is often the most expensive part of doing a search; it is usually worth searching first on an integer hash of the key. This way you only need to do integer comparisons for the most part.
I am AMAZINGLY surprised that binary search on a sorted array doesn't beat the pants off everything else. Binary search is vastly simpler than anything else and has optimal pruning. I seriously suspect your results here are to do with issue (1) above.
You don't say how much your results differed in your experiments. If the differences are small, it probably isn't worth rolling your own code.
-
\$\begingroup\$ Thanks for your comments, 1) yes, which is why unordered_map (which uses a hashing) suffers, hashing is actually more expensive than four right shifts (which is worst case with the above tree). 2) Well, a binary search on a sorted array is really no different to a search on a binary tree (which is what map uses) - and hence there is virtually no difference between the two. 3) Real numbers are irrelevant (i.e. platform specific), but the percentages are significant, the tree above can be more than three times quicker than any of the other approaches - I wouldn't have bothered otherwise. \$\endgroup\$Nim– Nim2011年07月07日 09:36:02 +00:00Commented Jul 7, 2011 at 9:36
-
\$\begingroup\$ Hi Nim, I'd expect binary search to be faster because it has half the number of memory accesses (each binary tree node requires one access for the key and one access for the child pointer). Is it possible that sorted vector is not inlining your comparison function? \$\endgroup\$Rafe– Rafe2011年07月08日 03:16:21 +00:00Commented Jul 8, 2011 at 3:16
-
\$\begingroup\$ I don't have a specific comparison function, I've implemented the
operator<
for the simple structure which I store in the vector, and this structure has the integer key, which looks likelhs.id < rhs.id
- I didn't check the assembler, but I'd be mightily surprised if it didn't inline the above. I'm using a very small number of elements for testing (64 to be precise) and I'd be surprised if the entire structure was not in the cache (in all the four tests). \$\endgroup\$Nim– Nim2011年07月08日 08:43:16 +00:00Commented Jul 8, 2011 at 8:43 -
\$\begingroup\$ ...and increasing the number of test items increases the the time taken (unsurprisingly - logarithmically!) With 200 items, the "optimized" tree maintains the best speed - the difference between that and
unordered_map
(which remains the same - as expected) is still three times - the unoptimized tree is half the time. \$\endgroup\$Nim– Nim2011年07月08日 08:52:44 +00:00Commented Jul 8, 2011 at 8:52 -
\$\begingroup\$ Hang on, I've just looked at your code - you've implemented a 256-way tree! Of course that will be faster, but at much greater memory cost. Inserting 256 items with keys 256i for i in 0..255 will occupy enough space for 64K items. \$\endgroup\$Rafe– Rafe2011年07月09日 00:01:37 +00:00Commented Jul 9, 2011 at 0:01