1
\$\begingroup\$

I've implemented a templated DBSCAN for general use. At the moment, it's going to be used on Android through the JNI. I used Wikipedia's pseudocode and a little bit of the DBSCAN paper for reference. It's pretty naive, so I'm wondering how I can speed it up, and what I can do to make it perform reasonably well on a phone. How can I improve my code?

template <typename T>
 struct node {
 T val;
 bool visited;
 bool clustered;
 node() : visited(false), clustered(false) {}
 };
 template <typename T>
 using cluster = std::vector<T>;
 template <typename T>
 using adj_list = std::list<node<T>*>;
 template <typename T>
 using graph = std::map<node<T>*, adj_list<T>>;
 template <typename T>
 std::vector<cluster<T>> DBSCAN(T* const& dataset, size_t const& dataset_size, double const& eps, size_t const& min_pts, double(*distance_function)(T const& lhs, T const& rhs)) {
 std::vector<node<T>> node_list(dataset_size);
 for (auto i = 0; i < dataset_size; ++i) {
 node_list[i].val = dataset[i];
 }
 graph<T> g;
 for (auto i = 0; i < node_list.size(); ++i) {
 for (auto j = 0; j < i; ++j) {
 if (distance_function(node_list[i].val, node_list[j].val) < eps) {
 g[&node_list[i]].push_back(&node_list[j]);
 g[&node_list[j]].push_back(&node_list[i]);
 }
 }
 }
 std::vector<cluster<T>> clusters;
 cluster<T> C;
 for (node<T>& n : node_list) {
 if (n.visited) continue;
 n.visited = true;
 adj_list<T> neighbour_pts = g[&n];
 if (neighbour_pts.size() >= min_pts) {
 expandCluster(n, neighbour_pts, C, g, min_pts);
 clusters.push_back(C);
 C = cluster<T>();
 }
 }
 return clusters;
 }
 template <typename T>
 void expandCluster(node<T>& point, adj_list<T>& neighbourhood, cluster<T>& C, graph<T>& g, unsigned const& min_pts) {
 C.push_back(point.val);
 point.clustered = true;
 for (node<T>*& n : neighbourhood) {
 if (!n->visited) {
 n->visited = true;
 adj_list<T> next_neighbourhood = g[n];
 if (next_neighbourhood.size() >= min_pts) neighbourhood.splice(neighbourhood.end(), next_neighbourhood);
 }
 if (!n->clustered) {
 C.push_back(n->val);
 n->clustered = true;
 }
 }
 }
asked Apr 1, 2016 at 23:38
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Naming things

Why you have all caps function name?

Why you have variable with capital letter in expandCluster()?
At first I thought it is template parameter i miss.

Use auto more often:

// looks ugly
for (node<T>*& n : neighbourhood) {
}
// looks better
for (auto & n : neighbourhood) {
 // adjust for * here
}
// or you can do with using
using mynode = node<T>;
// still looks better
for (mynode* & n : neighbourhood) {
}

DBSCAN()

Make this a type:

template<class T>
using something = std::vector<cluster<T>>

Why you pass dataset_size by reference?

size_t is as big as the pointer, but there will be additional hidden dereference that will slow down the execution.
keep the const, remove &

size_t const dataset_size

Do the same for all non class parameters.

dataset

probably I am wrong, but if you type this:

T* const& dataset

like this:

const T* dataset

will be same, but more readable? Not sure what T is. If T is a container or array, why not just:

const CONTAINER dataset

Why you pass C like function pointers?

Do it with functor class. Or with std::function.

I am not very experienced with this, so I will do it with functor, but will add it into template parameters.

expandCluster()

more or less similar comments.

answered Apr 2, 2016 at 11:32
\$\endgroup\$
5
  • \$\begingroup\$ Thanks for the advice! I think I prefer to alias node<T> than to use auto as it seems a bit clearer. I had no idea that passing primitives by reference was bad practice, so thanks for that. What's the benefit of using a functor? \$\endgroup\$ Commented Apr 3, 2016 at 13:56
  • \$\begingroup\$ functor looks nicer and easier. It can have "internal" status, but I dont think this will be beneficial in your case. \$\endgroup\$ Commented Apr 3, 2016 at 18:19
  • 1
    \$\begingroup\$ One big advantage of a functor is that the called function can be expanded in line a lot more often than when you use a pointer to a function (particularly important when you're concerned with speed, as you seem to be). \$\endgroup\$ Commented Apr 4, 2016 at 23:10
  • \$\begingroup\$ i think this is correct only when functor is injected as template. if functor is injected as argument i do not see how it will be expanded. \$\endgroup\$ Commented Apr 4, 2016 at 23:23
  • \$\begingroup\$ That's interesting. What's the reason for that, Jerry? \$\endgroup\$ Commented Apr 7, 2016 at 16:42

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.