I have a vector of uint16_t
and I want to check if there is a bit which is only set in one vector. I then get its position within the vector as well as from the bit.
Let's say we have:
v[0] = 0100
v[1] = 0110
v[2] = 1001
I would like to know that ind = 1
and val = 2
.
In clumsy code this would be like that:
#include <cstdint>
#include <vector>
#include <fstream>
const uint16_t N = 10;
int main() {
// Create a dataset
std::vector<uint16_t> vec(N, (1 << 2) + (1 << 3));
vec[3] = (1 << 5);
vec[6] = (1 << 7);
// loop over the values
for (auto i = 0; i < 16; i++) {
uint16_t counter = 0,ind;
// loop over the vector
for (auto ii = 0; ii < N; ii++) {
if (vec[ii] & (1 << i)) {
if (counter++ == 1) { break; }
ind = ii;
}
} // vector loop
// Do we only have a single value?
if (counter == 1) {
printf("%i @ %i\n",i,ind);
}
} // value loop
} // main
Is there a better way to do this? In case it matters using g++ with Ubuntu.
3 Answers 3
Let me start by pointing out problems in your code:
Wrong type
The type of your global constant N
is uint16_t
. There are several problems with this:
- as you are using
cstdint
the type should bestd::uint16_t
(because the C++ header provides them in the namespacestd
, this note should be applied to all instances ofuint16_t
in your code) - you are using
N
to indicate the size of thestd::vector
but sizes are specified usingstd::size_t
- you are using
uint16_t
for the variablescounter
andind
, which are not related to the type.counter
just counts, so you would be fine withstd::size_t
andind
is an index to a vector so it might be anstd::size_t
as well
Magic number
The loop boundary 16 in your code is related to the size of the type that is stored in the vector and offers possibilities for inconsistencies between the two. You should not rely on that, instead calculate it from the type:
int numberOfBitsPerEntry = sizeof(vec[0]) * CHAR_BIT;
And use that as loop boundary.
Variable names
While N
is borderline (as soon as there are more then one containers it becomes ambiguous) the other variable names are not:
i
is about as generic as it can get for a loop counter variable. What it actually does is signify the index of the bit you are currently looking at, so how about:bitIndex
ii
just tells me thati
was already taken and you temporarily forgot what was the next letter in the alphabet. This index actually walks through thestd::vector
's elements, so you might call itelementIndex
(which is still too generic but I don't know what exactly is stored in the vector)counter
has already some information but it still does not inform you what is counted, how aboutenabledBitsCounter
or if you take the notion of arranging your elements above each other so that the bits with the same index form a column:columnBitsCounter
(in that lightelementIndex
might be namedrowIndex
)ind
(probably short for index) is just another generic name. It signifies the index of the element that is the only one that has a one bit in this column.columnWinnerRowIndex
might be a bit too verbose but the general idea should be clearvec
does not say much about the contents. By staying in the table metaphor we might chooserows
as the new name
This is C++
You are using printf
in C++ where there are better alternatives (at least typesafety wise). Also you haven't included cstdio
or stdio.h
where printf
comes from.
As it is C++ I would pledge for std::cout
instead:
std::cout << bitIndex << " @ " << columnWinnerRowIndex << "\n";
Algorithm
Now to the algorithm itself. You need to walk through every column and look at each row so there is not much to change on the loops. (Besides the fact that auto might be overkill for int
variables)
However, I find the loop break unnecessary (avoid changing the flow of loops from within the body) and the counter
too heavyweight.
Basically there are 3 states for a column: - 0 one bits - 1 one bit - too many one bits
This would fit nicely into an enum:
enum ColumnState {
NO_ONE_BITS,
ONE_ONE_BIT,
TOO_MANY_ONE_BITS
};
Your counter is replaced with this:
ColumnState columnState = NO_ONE_BITS;
for(int rowIndex = 0; rowIndex < (int)rows.size() &&
columnState != TOO_MANY_ONE_BITS; ++rowIndex) {
if(rows[rowIndex] & (1 << columnIndex)) {
switch(columnState) {
case NO_ONE_BITS:
columnState = ONE_ONE_BIT;
columnWinnerRowIndex = rowIndex;
break;
case ONE_ONE_BIT:
columnState = TOO_MANY_ONE_BITS;
break;
default:
assert(!"Invalid state");
}
}
}
On second thought the enum
looks too verbose and the main problem I had with the break was the counter++ == 1
part that is a bit mind bending. Reverting to a counter might be more readable:
int columnBitsCounter = 0;
for(int rowIndex = 0; rowIndex < (int)rows.size(); ++rowIndex) {
if(rows[rowIndex] & (1 << columnIndex)) {
++columnBitsCounter;
if(columnBitsCounter > 1) {
break;
}
columnWinnerRowIndex = rowIndex;
}
}
Encapsulation
You might want to separate the logic of finding the column winners from the output code and the input. The signature of a function that allows this could be
std::vector<std::size_t> findRowWinners(const std::vector<std::uint16_t> &rows);
Where the result vector gives the winning row for each column or rows.size()
if there is no winner in this column.
-
\$\begingroup\$ Thank you very much for all the comments. I just put together a
mwe
and was more concerned about the algorithm than style. But there are defenetly some good points what I should do better next time. The 16 was just something I set. Since I only use 10 bits but the performance of ofuint16_t
is better thanbitset<10>
. But I'll check outsize_t
. Thank you for all the tips. \$\endgroup\$magu_– magu_2014年07月23日 05:53:11 +00:00Commented Jul 23, 2014 at 5:53
#include
organization
It is a good idea to keep your standard library #include
directives sorted alphabetically. This makes it easy to spot duplicates, or to add new dependencies in the right position.
constexpr
is for constants
C++11 introduces constexpr
, which you should use to declare N
a true constant:
constexpr uint16_t N = 16;
End-brace comments are unnecessary
In general, you don't need to add a comment at the end of a block indicating what just ended, like this:
for (/* ... */) {
// code...
} // vector loop
This clutters the code. It should be obvious what each block is doing. If it isn't obvious, you probably have too many nested blocks. Which means...
Create a method for the inner loop
Your inner loop over the vector can be succinctly extracted to a function of its own. Creating a function allows you to give a descriptive name to what you're doing, and limit the scope of variables. It also makes the outer loop easier to read and understand.
I have rewritten the inner loop into its own function that makes use of standard algorithms. It is slightly less efficient than your original method, but improves immutability, readability, and simplicity of the outer loop.
// Returns std::pair containing (count, index)
// If count is not 1, index is undefined
std::pair<std::size_t, std::size_t> uniqueVectorIndex(const std::size_t bit,
const std::vector<uint16_t>& vec) {
const auto isBitSet = [&] (uint16_t num) {return num & (1 << bit);};
const auto count = std::count_if(vec.begin(), vec.end(), isBitSet);
const auto indexPos = std::find_if(vec.begin(), vec.end(), isBitSet);
return std::make_pair(count, std::distance(vec.begin(), indexPos));
}
This is the most succinct version I can think of. If you want to make more guarantees on the result and increase efficiency, you could move the find_if()
call inside an if(count == 1)
block and return a sentinel value (like vec.size()
in Nobody's answer) otherwise.
Writing the function this way and returning a std::pair
allows the loop over the bit index to be greatly simplified:
for (auto bitIdx = 0; bitIdx < 16; ++bitIdx) {
std::size_t counter, ind;
std::tie(counter, ind) = uniqueVectorIndex(bitIdx, vec);
// Do we only have a single value?
if (counter == 1) {
printf("%i @ %lu\n", bitIdx, ind);
}
}
You will have to #include
<utility>
to get std::pair
, and <tuple>
for std::tie
.
-
\$\begingroup\$ Thank you for your comments. My
includes
are often a mess. Is there a way to detect unused ones? I normally simply comment them out and see if it still compiles, which is a bad practise I guess... I normal use-Wall -Wextra -Wpedantic
. The point withconstexpr
i didn't really understand. Why should I use it instead ofconst
? \$\endgroup\$magu_– magu_2014年07月23日 05:57:35 +00:00Commented Jul 23, 2014 at 5:57 -
@Nobody covers almost everything.
One alternative to using numberOfBitsPerEntry
would be to make the bit field the loop variable.
// loop over the values
for (uint16_t bitField = 1; bitField > 0; bitField = bitField << 1)
{
// etc.
} // value loop
This uses the fact that once we bit shift past the high bit, 32768 in this case, we get zero.
-
\$\begingroup\$ A very elegant way I didn't think about this. \$\endgroup\$magu_– magu_2014年07月23日 05:59:01 +00:00Commented Jul 23, 2014 at 5:59