Check if bit is only set once in a vector of int

Question 1

I have a vector of uint16_t and I want to check if there is a bit which is only set in one vector. I then get its position within the vector as well as from the bit.

Let's say we have:

v[0] = 0100
v[1] = 0110
v[2] = 1001

I would like to know that ind = 1 and val = 2.

In clumsy code this would be like that:

#include <cstdint>
#include <vector>
#include <fstream>
const uint16_t N = 10;
int main() {
 // Create a dataset
 std::vector<uint16_t> vec(N, (1 << 2) + (1 << 3));
 vec[3] = (1 << 5);
 vec[6] = (1 << 7);
 // loop over the values
 for (auto i = 0; i < 16; i++) { 
 uint16_t counter = 0,ind;
 // loop over the vector
 for (auto ii = 0; ii < N; ii++) { 
 if (vec[ii] & (1 << i)) {
 if (counter++ == 1) { break; }
 ind = ii;
 }
 } // vector loop
 // Do we only have a single value?
 if (counter == 1) {
 printf("%i @ %i\n",i,ind);
 }
 } // value loop
} // main

Is there a better way to do this? In case it matters using g++ with Ubuntu.

Question 2

Let me start by pointing out problems in your code:

Wrong type

The type of your global constant N is uint16_t. There are several problems with this:

as you are using cstdint the type should be std::uint16_t (because the C++ header provides them in the namespace std, this note should be applied to all instances of uint16_t in your code)
you are using N to indicate the size of the std::vector but sizes are specified using std::size_t
you are using uint16_t for the variables counter and ind, which are not related to the type. counter just counts, so you would be fine with std::size_t and ind is an index to a vector so it might be an std::size_t as well

Magic number

The loop boundary 16 in your code is related to the size of the type that is stored in the vector and offers possibilities for inconsistencies between the two. You should not rely on that, instead calculate it from the type:

int numberOfBitsPerEntry = sizeof(vec[0]) * CHAR_BIT;

And use that as loop boundary.

Variable names

While N is borderline (as soon as there are more then one containers it becomes ambiguous) the other variable names are not:

i is about as generic as it can get for a loop counter variable. What it actually does is signify the index of the bit you are currently looking at, so how about: bitIndex
ii just tells me that i was already taken and you temporarily forgot what was the next letter in the alphabet. This index actually walks through the std::vector's elements, so you might call it elementIndex (which is still too generic but I don't know what exactly is stored in the vector)
counter has already some information but it still does not inform you what is counted, how about enabledBitsCounter or if you take the notion of arranging your elements above each other so that the bits with the same index form a column: columnBitsCounter (in that light elementIndex might be named rowIndex)
ind (probably short for index) is just another generic name. It signifies the index of the element that is the only one that has a one bit in this column. columnWinnerRowIndex might be a bit too verbose but the general idea should be clear
vec does not say much about the contents. By staying in the table metaphor we might choose rows as the new name

This is C++

You are using printf in C++ where there are better alternatives (at least typesafety wise). Also you haven't included cstdio or stdio.h where printf comes from.

As it is C++ I would pledge for std::cout instead:

std::cout << bitIndex << " @ " << columnWinnerRowIndex << "\n";

Algorithm

Now to the algorithm itself. You need to walk through every column and look at each row so there is not much to change on the loops. (Besides the fact that auto might be overkill for int variables)

However, I find the loop break unnecessary (avoid changing the flow of loops from within the body) and the counter too heavyweight.

Basically there are 3 states for a column: - 0 one bits - 1 one bit - too many one bits

This would fit nicely into an enum:

enum ColumnState {
 NO_ONE_BITS,
 ONE_ONE_BIT,
 TOO_MANY_ONE_BITS
};

Your counter is replaced with this:

 ColumnState columnState = NO_ONE_BITS;
 for(int rowIndex = 0; rowIndex < (int)rows.size() &&
 columnState != TOO_MANY_ONE_BITS; ++rowIndex) {
 if(rows[rowIndex] & (1 << columnIndex)) {
 switch(columnState) {
 case NO_ONE_BITS:
 columnState = ONE_ONE_BIT;
 columnWinnerRowIndex = rowIndex;
 break;
 case ONE_ONE_BIT:
 columnState = TOO_MANY_ONE_BITS;
 break;
 default:
 assert(!"Invalid state");
 }
 }
 }

On second thought the enum looks too verbose and the main problem I had with the break was the counter++ == 1 part that is a bit mind bending. Reverting to a counter might be more readable:

 int columnBitsCounter = 0;
 for(int rowIndex = 0; rowIndex < (int)rows.size(); ++rowIndex) {
 if(rows[rowIndex] & (1 << columnIndex)) {
 ++columnBitsCounter;
 if(columnBitsCounter > 1) {
 break;
 }
 columnWinnerRowIndex = rowIndex;
 }
 }

Encapsulation

You might want to separate the logic of finding the column winners from the output code and the input. The signature of a function that allows this could be

std::vector<std::size_t> findRowWinners(const std::vector<std::uint16_t> &rows);

Where the result vector gives the winning row for each column or rows.size() if there is no winner in this column.

Question 3

Thank you very much for all the comments. I just put together a mwe and was more concerned about the algorithm than style. But there are defenetly some good points what I should do better next time. The 16 was just something I set. Since I only use 10 bits but the performance of of uint16_t is better than bitset<10>. But I'll check out size_t. Thank you for all the tips.

Question 4

`#include` organization

It is a good idea to keep your standard library #include directives sorted alphabetically. This makes it easy to spot duplicates, or to add new dependencies in the right position.

`constexpr` is for constants

C++11 introduces constexpr, which you should use to declare N a true constant:

constexpr uint16_t N = 16;

End-brace comments are unnecessary

In general, you don't need to add a comment at the end of a block indicating what just ended, like this:

for (/* ... */) {
 // code...
 } // vector loop

This clutters the code. It should be obvious what each block is doing. If it isn't obvious, you probably have too many nested blocks. Which means...

Create a method for the inner loop

Your inner loop over the vector can be succinctly extracted to a function of its own. Creating a function allows you to give a descriptive name to what you're doing, and limit the scope of variables. It also makes the outer loop easier to read and understand.

I have rewritten the inner loop into its own function that makes use of standard algorithms. It is slightly less efficient than your original method, but improves immutability, readability, and simplicity of the outer loop.

// Returns std::pair containing (count, index)
// If count is not 1, index is undefined
std::pair<std::size_t, std::size_t> uniqueVectorIndex(const std::size_t bit,
 const std::vector<uint16_t>& vec) {
 const auto isBitSet = [&] (uint16_t num) {return num & (1 << bit);};
 const auto count = std::count_if(vec.begin(), vec.end(), isBitSet);
 const auto indexPos = std::find_if(vec.begin(), vec.end(), isBitSet);
 return std::make_pair(count, std::distance(vec.begin(), indexPos));
}

This is the most succinct version I can think of. If you want to make more guarantees on the result and increase efficiency, you could move the find_if() call inside an if(count == 1) block and return a sentinel value (like vec.size() in Nobody's answer) otherwise.

Writing the function this way and returning a std::pair allows the loop over the bit index to be greatly simplified:

for (auto bitIdx = 0; bitIdx < 16; ++bitIdx) {
 std::size_t counter, ind;
 std::tie(counter, ind) = uniqueVectorIndex(bitIdx, vec);
 // Do we only have a single value?
 if (counter == 1) {
 printf("%i @ %lu\n", bitIdx, ind);
 }
}

You will have to #include <utility> to get std::pair, and <tuple> for std::tie.

Question 5

Thank you for your comments. My includes are often a mess. Is there a way to detect unused ones? I normally simply comment them out and see if it still compiles, which is a bad practise I guess... I normal use -Wall -Wextra -Wpedantic. The point with constexpr i didn't really understand. Why should I use it instead of const?

Question 6

I use your same method for finding duplicate includes. I don't know of a better. constexpr indicates that a value can be a compile-time constant.

Question 7

@Nobody covers almost everything.

One alternative to using numberOfBitsPerEntry would be to make the bit field the loop variable.

 // loop over the values
for (uint16_t bitField = 1; bitField > 0; bitField = bitField << 1)
{
 // etc.
} // value loop

This uses the fact that once we bit shift past the high bit, 32768 in this case, we get zero.

Question 8

A very elegant way I didn't think about this.

score 9 · Accepted Answer · 2014-07-22 13:25:23Z

Let me start by pointing out problems in your code:

Wrong type

The type of your global constant N is uint16_t. There are several problems with this:

as you are using cstdint the type should be std::uint16_t (because the C++ header provides them in the namespace std, this note should be applied to all instances of uint16_t in your code)
you are using N to indicate the size of the std::vector but sizes are specified using std::size_t
you are using uint16_t for the variables counter and ind, which are not related to the type. counter just counts, so you would be fine with std::size_t and ind is an index to a vector so it might be an std::size_t as well

Magic number

The loop boundary 16 in your code is related to the size of the type that is stored in the vector and offers possibilities for inconsistencies between the two. You should not rely on that, instead calculate it from the type:

int numberOfBitsPerEntry = sizeof(vec[0]) * CHAR_BIT;

And use that as loop boundary.

Variable names

While N is borderline (as soon as there are more then one containers it becomes ambiguous) the other variable names are not:

i is about as generic as it can get for a loop counter variable. What it actually does is signify the index of the bit you are currently looking at, so how about: bitIndex
ii just tells me that i was already taken and you temporarily forgot what was the next letter in the alphabet. This index actually walks through the std::vector's elements, so you might call it elementIndex (which is still too generic but I don't know what exactly is stored in the vector)
counter has already some information but it still does not inform you what is counted, how about enabledBitsCounter or if you take the notion of arranging your elements above each other so that the bits with the same index form a column: columnBitsCounter (in that light elementIndex might be named rowIndex)
ind (probably short for index) is just another generic name. It signifies the index of the element that is the only one that has a one bit in this column. columnWinnerRowIndex might be a bit too verbose but the general idea should be clear
vec does not say much about the contents. By staying in the table metaphor we might choose rows as the new name

This is C++

You are using printf in C++ where there are better alternatives (at least typesafety wise). Also you haven't included cstdio or stdio.h where printf comes from.

As it is C++ I would pledge for std::cout instead:

std::cout << bitIndex << " @ " << columnWinnerRowIndex << "\n";

Algorithm

Now to the algorithm itself. You need to walk through every column and look at each row so there is not much to change on the loops. (Besides the fact that auto might be overkill for int variables)

However, I find the loop break unnecessary (avoid changing the flow of loops from within the body) and the counter too heavyweight.

Basically there are 3 states for a column: - 0 one bits - 1 one bit - too many one bits

This would fit nicely into an enum:

enum ColumnState {
 NO_ONE_BITS,
 ONE_ONE_BIT,
 TOO_MANY_ONE_BITS
};

Your counter is replaced with this:

 ColumnState columnState = NO_ONE_BITS;
 for(int rowIndex = 0; rowIndex < (int)rows.size() &&
 columnState != TOO_MANY_ONE_BITS; ++rowIndex) {
 if(rows[rowIndex] & (1 << columnIndex)) {
 switch(columnState) {
 case NO_ONE_BITS:
 columnState = ONE_ONE_BIT;
 columnWinnerRowIndex = rowIndex;
 break;
 case ONE_ONE_BIT:
 columnState = TOO_MANY_ONE_BITS;
 break;
 default:
 assert(!"Invalid state");
 }
 }
 }

On second thought the enum looks too verbose and the main problem I had with the break was the counter++ == 1 part that is a bit mind bending. Reverting to a counter might be more readable:

 int columnBitsCounter = 0;
 for(int rowIndex = 0; rowIndex < (int)rows.size(); ++rowIndex) {
 if(rows[rowIndex] & (1 << columnIndex)) {
 ++columnBitsCounter;
 if(columnBitsCounter > 1) {
 break;
 }
 columnWinnerRowIndex = rowIndex;
 }
 }

Encapsulation

You might want to separate the logic of finding the column winners from the output code and the input. The signature of a function that allows this could be

std::vector<std::size_t> findRowWinners(const std::vector<std::uint16_t> &rows);

Where the result vector gives the winning row for each column or rows.size() if there is no winner in this column.

Thank you very much for all the comments. I just put together a mwe and was more concerned about the algorithm than style. But there are defenetly some good points what I should do better next time. The 16 was just something I set. Since I only use 10 bits but the performance of of uint16_t is better than bitset<10>. But I'll check out size_t. Thank you for all the tips.

Stack Exchange Network

Check if bit is only set once in a vector of int

3 Answers 3

Wrong type

Magic number

Variable names

This is C++

Algorithm

Encapsulation

`#include` organization

`constexpr` is for constants

End-brace comments are unnecessary

Create a method for the inner loop

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Check if bit is only set once in a vector of int

3 Answers 3

Wrong type

Magic number

Variable names

This is C++

Algorithm

Encapsulation

#include organization

constexpr is for constants

End-brace comments are unnecessary

Create a method for the inner loop

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`#include` organization

`constexpr` is for constants