Linked Questions

1029 votes
66 answers
675k views

8 bits representing the number 7 look like this: 00000111 Three bits are set. What are the algorithms to determine the number of set bits in a 32-bit integer?
959 votes
11 answers
188k views

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: ...
177 votes
36 answers
238k views

If I have some integer n, and I want to know the position of the most significant bit (that is, if the least significant bit is on the right, I want to know the position of the farthest left bit that ...
146 votes
23 answers
117k views

I am looking for an efficient way to determine the position of the least significant bit that is set in an integer, e.g. for 0x0FF0 it would be 4. A trivial implementation is this: unsigned ...
350 votes
4 answers
51k views

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your ...
160 votes
6 answers
117k views

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0? 012B1002 in al,dx 012B1003 push ...
37 votes
5 answers
10k views

Given std::bitset<64> bits with any number of bits set and a bit position X (0-63) What is the most efficient way to count bits at position X or lower or return 0 if the bit at X is not set ...
34 votes
4 answers
4k views

For min(ctz(x), ctz(y)), we can use ctz(x | y) to gain better performance. But what about max(ctz(x), ctz(y))? ctz represents "count trailing zeros". C++ version (Compiler Explorer) #include ...
42 votes
4 answers
9k views

I need to find the max element in the vector so I'm using std::max_element, but I've found that it's a very slow function, so I wrote my own version and manage to get x3 better performance, here is ...
MoonBun's user avatar
  • 4,412
67 votes
1 answer
8k views

I'm a newbie at instruction optimization. I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays. The C code is as follows: float dotp( ...
42 votes
2 answers
5k views

While benchmarking something I measured a much lower throughput than I had calculated, which I narrowed down to the LZCNT instruction (it also happens with TZCNT), as demonstrated in the following ...
24 votes
2 answers
5k views

I am trying to write very efficient Hamming-distance code. Inspired by Wojciech Muła's extremely clever SSE3 popcount implementation, I coded an AVX2 equivalent solution, this time using 256 bit ...
BlueStrat's user avatar
  • 2,324
23 votes
2 answers
10k views

According to http://www.agner.org/optimize/instruction_tables.pdf, the POPCNT instruction (which returns the number of set bits in a 32-bit or 64-bit register) has a throughput of 1 instruction per ...
Siqi Lin's user avatar
  • 1,277
13 votes
6 answers
3k views

I have four 2-bit bitfields stored in a single byte. Each bitfield can thus represent 0, 1, 2, or 3. For example, here are the 4 possible values where the first 3 bitfields are zero: 00 00 00 00 = 0 ...
12 votes
5 answers
5k views

The C standard is quite unclear about the uint_fast*_t family of types. On a gcc-4.4.4 linux x86_64 system, the types uint_fast16_t and uint_fast32_t are both 8 bytes in size. However, multiplication ...

15 30 50 per page
1
2 3