23 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
10
votes
5
answers
617
views
How to perform addition of two vectors of 8-bit integers with a single addition in C/C++
From my reading of this answer, it is possible to perform addition of two pairs of 4-bit integers, stored in 8-bit integers, with just one addition and some bitwise operations. Also, the author of ...
5
votes
4
answers
344
views
What is the fastest way to add 16 small numbers
I have two arrays, a and b. Each contain 16 bytes and I would like to add each b[i] to their corresponding a[i]. The arrays do not overlap and also I know that the resulting sums always fit in a byte ...
-2
votes
1
answer
133
views
How can I populate an array with multiple copies of a small value with the fewest operations possible?
Suppose I have an object comprising many small integer types. For example:
uint16_t values[8];
Or as part of a union:
union Data {
uint16_t values[8];
// Other members
};
I would like to ...
1
vote
1
answer
134
views
Can packing variables or parameters into structures/unions introduce unforseen performance penalties?
This is not asking about structure padding/packing, which refers to any unnamed bytes inserted into structures for alignment purposes.
I have this function:
#include <stdint.h>
uint8_t get_index(...
1
vote
1
answer
493
views
Speed up strlen using SWAR in x86-64 assembly
The asm function strlen receives the link to a string as a char - Array. To to so, the function may use SWAR on general purpose register, but without using xmm register or SSE instructions.
The ...
0
votes
1
answer
675
views
How to check if a register contains a zero byte without SIMD instructions
Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE ...
-1
votes
4
answers
606
views
Add two vectors (uint64_t type) with saturation for each int8_t element
I was recently faced with a given problem:
There are 8 elements in the vector, each is represented by int8_t.
Implement an algorithm in x86_64 that will add two vectors (uint64_t type).
Adding ...
14
votes
1
answer
923
views
SIMD-within-a-register version of min/max
Suppose I have two uint16_t[4] arrays, a and b. Each integer in these arrays is in the range [0, 16383], so bits 14 and 15 aren't set. Then I have some code to find the minimum and maximum among a[i] ...
1
vote
3
answers
381
views
Fastest way to find 16bit match in a 4 element short array?
I may confirm by using nanobench. Today I don't feel clever and can't think of an easy way
I have a array, short arr[]={0x1234, 0x5432, 0x9090, 0xFEED};. I know I can use SIMD to compare all elements ...
user avatar
user20746246
5
votes
4
answers
612
views
Multiplication of two packed signed integers in one
The Stockfish chess engine needs to store, for its evaluation, both an endgame score and a middlegame score.
Instead of storing them separately, it packs them into one int. The middlegame score is ...
3
votes
1
answer
243
views
Performantly reverse the order of 16-bit quantities within a 64-bit word
I need to do a lexicographic comparison of a small number of small unsigned integers. If there are (for example) 8 8-bit integers, the obvious approach is to byteswap them and do an ordinary integer ...
0
votes
2
answers
328
views
bit twiddling to right pack bits
I have the following code which right packs every 4 bits of a 64 bit int. This is the naive way of doing it, I am using a lookup table and a loop. I am wondering if there is a faster bit twiddling, ...
7
votes
4
answers
906
views
How to implement SWAR unsigned less-than?
I'm trying to use uint64_t as if it was 8 lanes of uint8_ts; my goal is to implement a lane-by-lane less-than. This operation, given x and y, should produce a result with 0xFF in a lane if the value ...
1
vote
2
answers
517
views
How to write a SWAR comparison which puts 0xFF in a lane on matches?
I'm trying to write a SWAR compare-for-equality operation, working on uint64_t pretending to be 8 'lanes' of uint8_t. The closest I've managed to achieve, based on techniques in Hacker's Delight and ...
2
votes
2
answers
677
views
SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?
Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*...