Speeding up the generation all bit strings with Hamming distance one

Question 1

I'm playing around with hypercubes. For this, I need to create its edge set. Specifically, the edge set is the set of all bit strings with Hamming distance 1. An important property is that I want to generate every such pair exactly once. I do this with the following piece of code (I'm also providing a small benchmark for testing its performance):

#include <iostream>
#include <vector>
#include <chrono>
#include <cmath>
int main() 
{
 // With 2^15, takes about 0.8 seconds on my machine.
 const int n = std::pow(2, 15);
 // Storing edges not strictly necessary, but let's do it so that the compiler
 // doesn't optimize something away.
 std::vector<int> edges;
 auto start = std::chrono::high_resolution_clock::now();
 for (int i = 0; i < n; ++i)
 {
 for (int j = 0; j < i; ++j)
 {
 const int u = i ^ j;
 // https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
 int k = u - 1;
 k |= k >> 1;
 k |= k >> 2;
 k |= k >> 4;
 k |= k >> 8;
 k |= k >> 16;
 if (k + 1 == u)
 {
 edges.emplace_back(j);
 edges.emplace_back(i);
 }
 }
 }
 auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
 std::cout << (duration.count() / 1000.0) << "\n";
 // 2^(n-1)*n
 std::cout << (edges.size() / 2) << "\n";
}

This works fine, but can we make the above even more efficient?

Question 2

To start with, here is a trivial optimization you can perform: in the most inner loop, you store u, then have k = u - 1, then compare k + 1 to u, which corresponds to comparing k to u - 1. Basically your inner loop performs unneeded work; just set directly u to (i ^ j) - 1 and you will save some work. Here is the most inner loop once changed:

const int u = (i ^ j) - 1;
// https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
int k = u;
k |= k >> 1;
k |= k >> 2;
k |= k >> 4;
k |= k >> 8;
k |= k >> 16;
if (k == u)
{
 edges.emplace_back(j);
 edges.emplace_back(i);
}

It timed it on my computer and it was consistently a bit faster, which means that the compiler wasn't able to make this simple transformation.

On the other hand, we can notice an interesting pattern with your original code: k + 1 can only be equal to u when u is a power of \2ドル\$. Checking whether a number is a power of \2ドル\$ or (or \0ドル\,ドル but the formula you use already has a special case of \0ドル\$) is easily done: a power of \2ドル\$ has exactly one bit set, so you only have to check whether removing a set bit from an integer returns \0ドル\$ to check whether an integer is \0ドル\$ or a power of \2ドル\$. This comparison is so cheap that you can perform it first and perform your current check only when it is true:

const int u = (i ^ j);
if (not (u & (u - 1))) // check whether u is 0 or a power of 2
{
 int k = u - 1;
 k |= k >> 1;
 k |= k >> 2;
 k |= k >> 4;
 k |= k >> 8;
 k |= k >> 16;
 if (k + 1 == u)
 {
 edges.emplace_back(j);
 edges.emplace_back(i);
 }
}

Now, correct me if I'm wrong, but if u is a power of two, you don't actually need to check anything else after (you don't need to round up to a power of \2ドル\$ since you know it's already one). You can simply remove all the k stuff. I mean, it was a wild guess, but the result were exactly the same on my computer. You can just check that u is a power of \2ドル\$ then emplace_back your values directly.

Question 3

Wow, I'm positively surprised :-) If my testbench clocked 0.8 seconds before, after your suggestion, it dropped to 0.275 seconds. Very nice, thanks! (This was on MSVC, by the way).

Morwenn Morwenn 20.2k3 gold badges69 silver badges132 bronze badges · Accepted Answer · 2016-01-10 13:59:57Z

To start with, here is a trivial optimization you can perform: in the most inner loop, you store u, then have k = u - 1, then compare k + 1 to u, which corresponds to comparing k to u - 1. Basically your inner loop performs unneeded work; just set directly u to (i ^ j) - 1 and you will save some work. Here is the most inner loop once changed:

const int u = (i ^ j) - 1;
// https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
int k = u;
k |= k >> 1;
k |= k >> 2;
k |= k >> 4;
k |= k >> 8;
k |= k >> 16;
if (k == u)
{
 edges.emplace_back(j);
 edges.emplace_back(i);
}

It timed it on my computer and it was consistently a bit faster, which means that the compiler wasn't able to make this simple transformation.

On the other hand, we can notice an interesting pattern with your original code: k + 1 can only be equal to u when u is a power of \2ドル\$. Checking whether a number is a power of \2ドル\$ or (or \0ドル\,ドル but the formula you use already has a special case of \0ドル\$) is easily done: a power of \2ドル\$ has exactly one bit set, so you only have to check whether removing a set bit from an integer returns \0ドル\$ to check whether an integer is \0ドル\$ or a power of \2ドル\$. This comparison is so cheap that you can perform it first and perform your current check only when it is true:

const int u = (i ^ j);
if (not (u & (u - 1))) // check whether u is 0 or a power of 2
{
 int k = u - 1;
 k |= k >> 1;
 k |= k >> 2;
 k |= k >> 4;
 k |= k >> 8;
 k |= k >> 16;
 if (k + 1 == u)
 {
 edges.emplace_back(j);
 edges.emplace_back(i);
 }
}

Now, correct me if I'm wrong, but if u is a power of two, you don't actually need to check anything else after (you don't need to round up to a power of \2ドル\$ since you know it's already one). You can simply remove all the k stuff. I mean, it was a wild guess, but the result were exactly the same on my computer. You can just check that u is a power of \2ドル\$ then emplace_back your values directly.

Wow, I'm positively surprised :-) If my testbench clocked 0.8 seconds before, after your suggestion, it dropped to 0.275 seconds. Very nice, thanks! (This was on MSVC, by the way).

Stack Exchange Network

Speeding up the generation all bit strings with Hamming distance one

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Speeding up the generation all bit strings with Hamming distance one

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions