Tree/array mashup

Question 1

I'm currently trying to prove a data structure which I've created to speed up lookups for integral keys. The specific case I have in mind has the following usage case:

At start is populated with n number of items (where n can be from 1 to thousands), and the specific key is actually an unsigned int.
The critical path in the code requires a lookup based on this key.

I've tried the following:

sorted vector with binary search
std::map
boost::unordered_map

And now this tree. I've found through profiling that the following holds

If every item looked up exists, then unordered_set is quicker than std::map which is quicker than sorted vector
If the number of successful lookups for the search set is low, then std::map is quicker than unordered_set (I guess the hash overhead is higher for failure case)

With this tree, I've found that it's quicker in all cases, but what I would like is some critique on the approach - are there any further optimisations that I am missing for example, and here is the tree:

/** Copyright: Nim 2011 */
template <typename KeyType, typename ValueType>
class tree
{
public:
 typedef ValueType value_type;
 typedef ValueType& reference;
 typedef ValueType const& const_reference;
 typedef KeyType key_type;
 typedef ValueType* iterator;
 typedef ValueType const* const_iterator;
private:
 static const size_t MAX_SIZE = 256;
 typedef unsigned char index_type;
 struct node
 {
 node() : _cKey(), _cValue(), _vNodes() {}
 ~node()
 {
 cleanup();
 }
 void add(key_type index, key_type key, const_reference value)
 {
 if (!index)
 {
 _cKey = key;
 _cValue.reset(new value_type(value));
 return;
 }
 // get the lsb
 index_type lsb = index & 0xff;
 if (!_vNodes)
 {
 _vNodes = new node*[MAX_SIZE]();
 _vNodes[lsb] = new node();
 }
 else if (!_vNodes[lsb])
 _vNodes[lsb] = new node();
 _vNodes[lsb]->add(index >> 8, key, value);
 }
 iterator find(key_type index, key_type key)
 {
 if (_cKey == key)
 return _cValue.get();
 if (_vNodes)
 {
 // get the lsb
 index_type lsb = index & 0xff;
 if (_vNodes[lsb])
 return _vNodes[lsb]->find(index >> 8, key);
 }
 return 0;
 }
 const_iterator find(key_type index, key_type key) const
 {
 if (_cKey == key)
 return _cValue.get();
 if (_vNodes)
 {
 // get the lsb
 index_type lsb = index & 0xff;
 if (_vNodes[lsb])
 return _vNodes[lsb]->find(index >> 8, key);
 }
 return 0;
 }
 void optimize()
 {
 if (_vNodes)
 {
 // first see how many nodes we have
 size_t count = 0;
 for(size_t i = 0; i < MAX_SIZE; ++i)
 count += (_vNodes[i] != 0);
 switch(count)
 {
 case 0: return; // nothing to optimize
 case 1: pullup(); break;// we could pull up
 default:
 {
 // means we have more than one node at this level, so optimize each
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 _vNodes[i]->optimize();
 }
 }
 }
 }
 void pullup()
 {
 // try to pullup a leaf node
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 {
 // if we manage to pullup, then cleanup the nodeset at this level
 if (_vNodes[i]->pullup(_cKey, _cValue))
 cleanup();
 break;
 }
 }
 bool pullup(key_type& key, boost::shared_ptr<value_type>& value)
 {
 if (!_vNodes)
 {
 // this is a leaf node -
 key = _cKey;
 value = _cValue;
 return true;
 }
 // cannot pull up at this level
 // see whether we can pull up from levels below
 size_t count = 0;
 for(size_t i = 0; i < MAX_SIZE; ++i)
 count += (_vNodes[i] != 0);
 switch(count)
 {
 case 0:
 {
 // this is a leaf node -
 key = _cKey;
 value = _cValue;
 return true;
 }
 case 1:
 {
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 // if we manage to pullup, then cleanup the nodeset at this level
 return _vNodes[i]->pullup(key, value);
 break;
 }
 default:
 {
 // means we have more than one node at this level so cannot pullup
 }
 }
 return false;
 }
 void print(size_t index, std::ostream& str)
 {
 str << "<node i='" << index << '\'';
 if (_cValue)
 str << " key='" << _cKey << "' value='" << *_cValue << "'";
 str << '>' << std::endl;
 if (_vNodes != 0)
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 _vNodes[i]->print(i, str);
 str << "</node>"; 
 }
 void cleanup()
 {
 if (_vNodes)
 {
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 delete _vNodes[i];
 delete[] _vNodes;
 _vNodes = 0;
 }
 }
 key_type _cKey;
 boost::shared_ptr<value_type> _cValue;
 node** _vNodes;
 };
public:
 tree() : _vNodes() {}
 ~tree()
 {
 if (_vNodes != 0)
 {
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 delete _vNodes[i];
 delete[] _vNodes;
 _vNodes = 0;
 }
 }
 iterator end() { return 0; }
 const_iterator end() const { return 0; }
 void add(key_type key, const_reference value)
 {
 // get the lsb
 unsigned char lsb = key & 0xff;
 if (!_vNodes)
 {
 _vNodes = new node*[MAX_SIZE]();
 _vNodes[lsb] = new node();
 }
 else if (!_vNodes[lsb])
 _vNodes[lsb] = new node();
 _vNodes[lsb]->add(key >> 8, key, value);
 }
 iterator find(key_type key)
 {
 if (_vNodes)
 {
 // get the lsb
 index_type lsb = key & 0xff;
 if (_vNodes[lsb])
 return _vNodes[lsb]->find(key >> 8, key);
 }
 return 0;
 }
 const_iterator find(key_type key) const
 {
 if (_vNodes)
 {
 // get the lsb
 index_type lsb = key & 0xff;
 if (_vNodes[lsb])
 return _vNodes[lsb]->find(key >> 8, key);
 }
 return 0;
 }
 void optimize()
 {
 if (_vNodes )
 // optmize each root node
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (_vNodes[i])
 _vNodes[i]->optimize();
 }
 friend
 std::ostream& operator<<(std::ostream& str, tree const& t)
 {
 str << "<?xml version='1.0'?>" << std::endl;
 if (!t._vNodes)
 return str << "<tree/>" << std::endl;
 str << "<tree> " << std::endl;
 for(size_t i = 0; i < MAX_SIZE; ++i)
 if (t._vNodes[i])
 t._vNodes[i]->print(i, str);
 return str << "</tree> " << std::endl; 
 }
private:
 node** _vNodes;
};

Some salient points:

I don't need the container to be iterable - all I want is the fastest possible lookups
I don't care about insertion performance as it's a start up cost
I've added an optimize method to "prune" the tree and this works nicely, however the performance difference between a pruned and a normal tree is negligible.
Yes, I've considered boost::scoped_array.
I am considering using a std::vector<node*> rather than node**, I find the pointer syntax strangely appealing.

Question 2

Just a few points (caveat: I'm not a C++ expert).

Comparing structured (i.e., non-scalar) keys is often the most expensive part of doing a search; it is usually worth searching first on an integer hash of the key. This way you only need to do integer comparisons for the most part.
I am AMAZINGLY surprised that binary search on a sorted array doesn't beat the pants off everything else. Binary search is vastly simpler than anything else and has optimal pruning. I seriously suspect your results here are to do with issue (1) above.
You don't say how much your results differed in your experiments. If the differences are small, it probably isn't worth rolling your own code.

Question 3

Thanks for your comments, 1) yes, which is why unordered_map (which uses a hashing) suffers, hashing is actually more expensive than four right shifts (which is worst case with the above tree). 2) Well, a binary search on a sorted array is really no different to a search on a binary tree (which is what map uses) - and hence there is virtually no difference between the two. 3) Real numbers are irrelevant (i.e. platform specific), but the percentages are significant, the tree above can be more than three times quicker than any of the other approaches - I wouldn't have bothered otherwise.

Question 4

Hi Nim, I'd expect binary search to be faster because it has half the number of memory accesses (each binary tree node requires one access for the key and one access for the child pointer). Is it possible that sorted vector is not inlining your comparison function?

Question 5

I don't have a specific comparison function, I've implemented the operator< for the simple structure which I store in the vector, and this structure has the integer key, which looks like lhs.id < rhs.id - I didn't check the assembler, but I'd be mightily surprised if it didn't inline the above. I'm using a very small number of elements for testing (64 to be precise) and I'd be surprised if the entire structure was not in the cache (in all the four tests).

Question 6

...and increasing the number of test items increases the the time taken (unsurprisingly - logarithmically!) With 200 items, the "optimized" tree maintains the best speed - the difference between that and unordered_map (which remains the same - as expected) is still three times - the unoptimized tree is half the time.

Question 7

Hang on, I've just looked at your code - you've implemented a 256-way tree! Of course that will be faster, but at much greater memory cost. Inserting 256 items with keys 256i for i in 0..255 will occupy enough space for 64K items.

Rafe Rafe 1,3589 silver badges12 bronze badges · Answer 1 · 2011-07-06 23:17:26Z

2

\$\begingroup\$

Just a few points (caveat: I'm not a C++ expert).

Comparing structured (i.e., non-scalar) keys is often the most expensive part of doing a search; it is usually worth searching first on an integer hash of the key. This way you only need to do integer comparisons for the most part.
I am AMAZINGLY surprised that binary search on a sorted array doesn't beat the pants off everything else. Binary search is vastly simpler than anything else and has optimal pruning. I seriously suspect your results here are to do with issue (1) above.
You don't say how much your results differed in your experiments. If the differences are small, it probably isn't worth rolling your own code.

Share

edited Apr 10, 2014 at 3:36

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

answered Jul 6, 2011 at 23:17

Rafe's user avatar

Rafe Rafe

1,3589 silver badges12 bronze badges

\$\endgroup\$

6

\$\begingroup\$ Thanks for your comments, 1) yes, which is why unordered_map (which uses a hashing) suffers, hashing is actually more expensive than four right shifts (which is worst case with the above tree). 2) Well, a binary search on a sorted array is really no different to a search on a binary tree (which is what map uses) - and hence there is virtually no difference between the two. 3) Real numbers are irrelevant (i.e. platform specific), but the percentages are significant, the tree above can be more than three times quicker than any of the other approaches - I wouldn't have bothered otherwise. \$\endgroup\$

Nim
– Nim

2011年07月07日 09:36:02 +00:00
Commented Jul 7, 2011 at 9:36
\$\begingroup\$ Hi Nim, I'd expect binary search to be faster because it has half the number of memory accesses (each binary tree node requires one access for the key and one access for the child pointer). Is it possible that sorted vector is not inlining your comparison function? \$\endgroup\$

Rafe
– Rafe

2011年07月08日 03:16:21 +00:00
Commented Jul 8, 2011 at 3:16
\$\begingroup\$ I don't have a specific comparison function, I've implemented the operator< for the simple structure which I store in the vector, and this structure has the integer key, which looks like lhs.id < rhs.id - I didn't check the assembler, but I'd be mightily surprised if it didn't inline the above. I'm using a very small number of elements for testing (64 to be precise) and I'd be surprised if the entire structure was not in the cache (in all the four tests). \$\endgroup\$

Nim
– Nim

2011年07月08日 08:43:16 +00:00
Commented Jul 8, 2011 at 8:43
\$\begingroup\$ ...and increasing the number of test items increases the the time taken (unsurprisingly - logarithmically!) With 200 items, the "optimized" tree maintains the best speed - the difference between that and unordered_map (which remains the same - as expected) is still three times - the unoptimized tree is half the time. \$\endgroup\$

Nim
– Nim

2011年07月08日 08:52:44 +00:00
Commented Jul 8, 2011 at 8:52
\$\begingroup\$ Hang on, I've just looked at your code - you've implemented a 256-way tree! Of course that will be faster, but at much greater memory cost. Inserting 256 items with keys 256i for i in 0..255 will occupy enough space for 64K items. \$\endgroup\$

Rafe
– Rafe

2011年07月09日 00:01:37 +00:00
Commented Jul 9, 2011 at 0:01

| Show 1 more comment

Stack Exchange Network

Tree/array mashup

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Tree/array mashup

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions