Given a set of overlapping boxes, and a set of points, I'd like to be able to determine which of the boxes contain each point. The number of points is much larger than the number of boxes, and I'd like to do it faster that brute force, so I'm trying to do it with a map-like structure.
My current attempt is below, it seems to pass my initial tests but I think there are some problems.
- There's a few places where I decrement
std::map iterators
- is it well defined in this case? Should I not be using lower_bound? - I keep an empty map on the
BoxTree
purely so I can makeGetBoxesContaining()
return by const reference - is there a better solution?
#include <list>
#include <map>
#include <limits>
struct BoxSample {
double m_lowerX, m_upperX, m_lowerY, m_upperY;
};
//Templated so can handle either the outer or inner map of BoxTree
//Checks that the map contains an element with the same key
template<typename T, typename Bound>
typename std::map<Bound, T>::iterator VerifyBound(typename std::map<Bound, T>& mapRef, Bound key)
{
auto lb = mapRef.lower_bound(key);
if(lb == mapRef.end() || (lb != mapRef.begin() && lb->first != key))
{
--lb;
//Populates the new key with the contents from the immediately lowest keypair
auto parib = mapRef.insert(std::make_pair(key, lb->second));
return parib.first;
} else {
//Either entry already exists or new key <= minimum
return lb;
}
}
template<typename Box, typename Bound>
class BoxTree {
std::map<Bound, std::map<Bound, std::list<Box>>> m_tree;
std::list<Box> m_empty;
public:
BoxTree(Bound min)
:m_tree() {
m_tree[min][min] = std::list<Box>();
}
void Insert(const Box& t)
{
auto xLB = VerifyBound(m_tree, t.m_lowerX);
auto xUB = VerifyBound(m_tree, t.m_upperX);
while(xLB != xUB) {
auto yLB = VerifyBound(xLB->second, t.m_lowerY);
auto yUB = VerifyBound(xLB->second, t.m_upperY);
while(yLB != yUB)
{
yLB->second.push_front(t);
++yLB;
}
++xLB;
}
}
//Should return all boxes A such that A.lowerX <= x < A.lowerX && A.lowerY <= y < A.lowerY
const std::list<Box>& GetBoxesContaining(const Bound& x, const Bound& y) {
auto xSet = m_tree.lower_bound(x);
if(xSet != m_tree.begin())
{
--xSet;
auto ySet = xSet->second.lower_bound(y);
if(ySet != xSet->second.begin())
{
--ySet;
return ySet->second;
}
}
return m_empty;
}
};
//Sample usage
class BoxTreeSample {
BoxTree<BoxSample, double> m_impl;
public:
BoxTreeSample(): m_impl(-1.0 * std::numeric_limits<double>::infinity()) {
BoxSample box1 = { 0.0, 10.0, 0.0, 10.0 };
BoxSample box2 = { 0.0, 10.0, 0.0, 20.0 };
BoxSample box3 = { 5.0, 10.0, 5.0, 15.0 };
m_impl.Insert(box1);
m_impl.Insert(box2);
m_impl.Insert(box3);
auto t1 = m_impl.GetBoxesContaining(2.5, 12.5);
auto t2 = m_impl.GetBoxesContaining(7.5, 12.5);
}
};
int main() {
auto zz = BoxTreeSample();
return 0;
}
1 Answer 1
It looks like some of the test code (inclusion of <limits>
and definition of BoxSample
) is mixed in with the library code. These should be extracted to the "sample usage".
The test program is strange - I don't see any need for the BoxTreeSample
class, as it doesn't provide anything a plain function doesn't. I'd re-write that as a simple main()
(with self-checking added):
int main()
{
BoxTree<BoxSample, double> boxtree{-std::numeric_limits<double>::infinity()};
BoxSample box1 = { 0.0, 10.0, 0.0, 10.0 };
BoxSample box2 = { 0.0, 10.0, 0.0, 20.0 };
BoxSample box3 = { 5.0, 10.0, 5.0, 15.0 };
boxtree.Insert(box1);
boxtree.Insert(box2);
boxtree.Insert(box3);
auto t1 = boxtree.GetBoxesContaining(2.5, 12.5);
auto t2 = boxtree.GetBoxesContaining(7.5, 12.5);
return (t1.size() != 1)
+ (t2.size() != 2);
}
It took me some time to discern the algorithm from the code. An introductory comment would be helpful:
/*
We represent the set of rectangles by dividing the area into columns,
and each column into boxes.
Each x ordinate in our map represents the column immediately to its
right; that column is in turn represented by a map of y ordinates to
the box immediately following (a list of the containing rectangles).
*/
The VerifyBound()
function doesn't seem to be generally useful; I'd be inclined to make it a private, static member of the template class rather than sharing with all code. The name is misleading, too - it does more than verify, because it can split an existing column or box into two, copying the associated list of rectangles.
It would be nice if we could make it self-starting, so that we don't need to provide min
to BoxTree
's constructor:
private:
template<typename T>
auto GetOrSplit(typename std::map<Bound, T>& mapRef, Bound key)
{
auto lb = mapRef.lower_bound(key);
if (lb != mapRef.end() && lb->first == key) {
// already a suitable entry to re-use
return lb;
}
if (lb == mapRef.begin()) {
// new item before any existing ones
return mapRef.insert({key, {}}).first;
}
// split existing item into two
--lb;
return mapRef.insert(std::make_pair(key, lb->second)).first;
}
Now, turning our attention to BoxTree
, we see it's templated on Box
, but has quite specific requirements about the members it expects from this type. It's worth writing some constraints:
template<typename Box, typename Bound>
requires requires(Box b, Bound x) {
x = b.m_lowerX;
x = b.m_upperX;
x = b.m_lowerY;
x = b.m_upperY;
}
class BoxTree
The choice of std::list
in the representation could be harmful to performance, as it's expensive to copy (as we do in VerifyBound()
). I'd venture that std::vector
would be more performant (order is not significant, so we only need to append, and we never insert or delete elements).
We might also consider whether we want to store so many copies of our Box
es (unfortunately, our current implementation isn't able to use a std::reference_wrapper
as our Box
type; consider that for an enhancement).
GetBoxesContaining()
looks reasonably straightforward. The only change I'd make is that we don't expect it to modify the BoxTree
, so let's make it const
. And it's the only place where m_empty
is used, so we can reduce the scope of that.
Modified code
#include <map>
#include <vector>
template<typename Box, typename Bound>
requires requires(Box b, Bound x) {
x = b.m_lowerX;
x = b.m_upperX;
x = b.m_lowerY;
x = b.m_upperY;
}
class BoxTree
{
/*
We represent the set of rectangles by dividing the area into columns,
and each column into boxes.
Each x ordinate in our map represents the column immediately to its
right; that column is in turn represented by a map of y ordinates to
the box immediately following (a list of the containing rectangles).
*/
std::map<Bound, std::map<Bound, std::vector<Box>>> m_tree{};
public:
void Insert(const Box& t)
{
for (auto xLB = GetOrSplit(m_tree, t.m_lowerX), xUB = GetOrSplit(m_tree, t.m_upperX);
xLB != xUB;
++xLB)
{
auto& column = xLB->second;
for (auto yLB = GetOrSplit(column, t.m_lowerY), yUB = GetOrSplit(column, t.m_upperY);
yLB != yUB;
++yLB)
{
yLB->second.push_back(t);
}
}
}
const std::vector<Box>& GetBoxesContaining(const Bound& x, const Bound& y) const noexcept
{
static auto const no_boxes = std::vector<Box>{};
auto xSet = m_tree.lower_bound(x);
if (xSet == m_tree.begin()) {
return no_boxes;
}
auto const& column = (--xSet)->second;
auto ySet = column.lower_bound(y);
if (ySet == column.begin()) {
return no_boxes;
}
return (--ySet)->second;
}
private:
template<typename T>
auto GetOrSplit(typename std::map<Bound, T>& mapRef, Bound key)
{
auto lb = mapRef.lower_bound(key);
if (lb != mapRef.end() && lb->first == key) {
// already a suitable entry to re-use
return lb;
}
if (lb == mapRef.begin()) {
// new item before any existing ones
return mapRef.insert({key, {}}).first;
}
// split existing item into two
--lb;
return mapRef.insert(std::make_pair(key, lb->second)).first;
}
};
// Simple test program
#include <limits>
struct BoxSample {
double m_lowerX, m_upperX, m_lowerY, m_upperY;
};
int main()
{
BoxTree<BoxSample, double> boxtree;
BoxSample box1 = { 0.0, 10.0, 0.0, 10.0 };
BoxSample box2 = { 0.0, 10.0, 0.0, 20.0 };
BoxSample box3 = { 5.0, 10.0, 5.0, 15.0 };
boxtree.Insert(box1);
boxtree.Insert(box2);
boxtree.Insert(box3);
auto count_boxes = [&boxtree](double x, double y) {
return boxtree.GetBoxesContaining(x, y).size();
};
return (count_boxes(2.5, 12.5) != 1)
+ (count_boxes(7.5, 12.5) != 2)
+ (count_boxes(7.5, 17.5) != 1)
+ (count_boxes(2.5, -2.5) != 0)
+ (count_boxes(-2.5, 2.5) != 0);
}
The data structure used really only makes sense if the number of queries massively outnumbers the boxes added, as each box can cause copying of up to 4 result lists as its bounds are inserted.
For a situation with a different usage pattern, you would want a different data structure (perhaps something based on a quadtree?).
std::numeric_limits
, and theauto&
cases Barry mentions. (Are you using Visual Studio?) \$\endgroup\$auto&
s, which I've changed toauto
. I've added the limits header, and it still compiles in VS2012. I can't opine about whether it compiles in gcc. \$\endgroup\$