I'm playing around with hypercubes. For this, I need to create its edge set. Specifically, the edge set is the set of all bit strings with Hamming distance 1. An important property is that I want to generate every such pair exactly once. I do this with the following piece of code (I'm also providing a small benchmark for testing its performance):
#include <iostream>
#include <vector>
#include <chrono>
#include <cmath>
int main()
{
// With 2^15, takes about 0.8 seconds on my machine.
const int n = std::pow(2, 15);
// Storing edges not strictly necessary, but let's do it so that the compiler
// doesn't optimize something away.
std::vector<int> edges;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < i; ++j)
{
const int u = i ^ j;
// https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
int k = u - 1;
k |= k >> 1;
k |= k >> 2;
k |= k >> 4;
k |= k >> 8;
k |= k >> 16;
if (k + 1 == u)
{
edges.emplace_back(j);
edges.emplace_back(i);
}
}
}
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
std::cout << (duration.count() / 1000.0) << "\n";
// 2^(n-1)*n
std::cout << (edges.size() / 2) << "\n";
}
This works fine, but can we make the above even more efficient?
1 Answer 1
To start with, here is a trivial optimization you can perform: in the most inner loop, you store u
, then have k = u - 1
, then compare k + 1
to u
, which corresponds to comparing k
to u - 1
. Basically your inner loop performs unneeded work; just set directly u
to (i ^ j) - 1
and you will save some work. Here is the most inner loop once changed:
const int u = (i ^ j) - 1;
// https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
int k = u;
k |= k >> 1;
k |= k >> 2;
k |= k >> 4;
k |= k >> 8;
k |= k >> 16;
if (k == u)
{
edges.emplace_back(j);
edges.emplace_back(i);
}
It timed it on my computer and it was consistently a bit faster, which means that the compiler wasn't able to make this simple transformation.
On the other hand, we can notice an interesting pattern with your original code: k + 1
can only be equal to u
when u
is a power of \2ドル\$. Checking whether a number is a power of \2ドル\$ or (or \0ドル\,ドル but the formula you use already has a special case of \0ドル\$) is easily done: a power of \2ドル\$ has exactly one bit set, so you only have to check whether removing a set bit from an integer returns \0ドル\$ to check whether an integer is \0ドル\$ or a power of \2ドル\$. This comparison is so cheap that you can perform it first and perform your current check only when it is true:
const int u = (i ^ j);
if (not (u & (u - 1))) // check whether u is 0 or a power of 2
{
int k = u - 1;
k |= k >> 1;
k |= k >> 2;
k |= k >> 4;
k |= k >> 8;
k |= k >> 16;
if (k + 1 == u)
{
edges.emplace_back(j);
edges.emplace_back(i);
}
}
Now, correct me if I'm wrong, but if u
is a power of two, you don't actually need to check anything else after (you don't need to round up to a power of \2ドル\$ since you know it's already one). You can simply remove all the k
stuff. I mean, it was a wild guess, but the result were exactly the same on my computer. You can just check that u
is a power of \2ドル\$ then emplace_back
your values directly.
-
\$\begingroup\$ Wow, I'm positively surprised :-) If my testbench clocked 0.8 seconds before, after your suggestion, it dropped to 0.275 seconds. Very nice, thanks! (This was on MSVC, by the way). \$\endgroup\$Gideon– Gideon2016年01月10日 14:31:29 +00:00Commented Jan 10, 2016 at 14:31