Newest 'swar' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

23 questions

10 votes

5 answers

617 views

How to perform addition of two vectors of 8-bit integers with a single addition in C/C++

From my reading of this answer, it is possible to perform addition of two pairs of 4-bit integers, stored in 8-bit integers, with just one addition and some bitwise operations. Also, the author of ...

fantasie's user avatar

fantasie

asked Feb 24, 2025 at 17:47

5 votes

4 answers

344 views

What is the fastest way to add 16 small numbers

I have two arrays, a and b. Each contain 16 bytes and I would like to add each b[i] to their corresponding a[i]. The arrays do not overlap and also I know that the resulting sums always fit in a byte ...

adrianton3's user avatar

adrianton3

2,378

asked May 22, 2024 at 2:09

-2 votes

1 answer

133 views

How can I populate an array with multiple copies of a small value with the fewest operations possible?

Suppose I have an object comprising many small integer types. For example: uint16_t values[8]; Or as part of a union: union Data { uint16_t values[8]; // Other members }; I would like to ...

CPlus's user avatar

CPlus

5,138

asked Apr 4, 2024 at 1:34

1 vote

1 answer

134 views

Can packing variables or parameters into structures/unions introduce unforseen performance penalties?

This is not asking about structure padding/packing, which refers to any unnamed bytes inserted into structures for alignment purposes. I have this function: #include <stdint.h> uint8_t get_index(...

CPlus's user avatar

CPlus

5,138

asked Dec 9, 2023 at 0:04

1 vote

1 answer

493 views

Speed up strlen using SWAR in x86-64 assembly

The asm function strlen receives the link to a string as a char - Array. To to so, the function may use SWAR on general purpose register, but without using xmm register or SSE instructions. The ...

HeapUnderStop's user avatar

HeapUnderStop

asked Jun 4, 2023 at 16:53

0 votes

1 answer

675 views

How to check if a register contains a zero byte without SIMD instructions

Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE ...

HeapUnderStop's user avatar

HeapUnderStop

asked Jun 1, 2023 at 12:56

-1 votes

4 answers

606 views

Add two vectors (uint64_t type) with saturation for each int8_t element

I was recently faced with a given problem: There are 8 elements in the vector, each is represented by int8_t. Implement an algorithm in x86_64 that will add two vectors (uint64_t type). Adding ...

Umbra's user avatar

Umbra

asked Apr 21, 2023 at 22:14

14 votes

1 answer

923 views

SIMD-within-a-register version of min/max

Suppose I have two uint16_t[4] arrays, a and b. Each integer in these arrays is in the range [0, 16383], so bits 14 and 15 aren't set. Then I have some code to find the minimum and maximum among a[i] ...

swineone's user avatar

swineone

3,010

asked Jan 18, 2023 at 0:42

1 vote

3 answers

381 views

Fastest way to find 16bit match in a 4 element short array?

I may confirm by using nanobench. Today I don't feel clever and can't think of an easy way I have a array, short arr[]={0x1234, 0x5432, 0x9090, 0xFEED};. I know I can use SIMD to compare all elements ...

user avatar

user20746246

asked Dec 14, 2022 at 19:11

5 votes

4 answers

612 views

Multiplication of two packed signed integers in one

The Stockfish chess engine needs to store, for its evaluation, both an endgame score and a middlegame score. Instead of storing them separately, it packs them into one int. The middlegame score is ...

Chayim Friedman's user avatar

Chayim Friedman

76.5k

asked Nov 22, 2022 at 2:27

3 votes

1 answer

243 views

Performantly reverse the order of 16-bit quantities within a 64-bit word

I need to do a lexicographic comparison of a small number of small unsigned integers. If there are (for example) 8 8-bit integers, the obvious approach is to byteswap them and do an ordinary integer ...

Moonchild's user avatar

Moonchild

asked May 15, 2022 at 6:48

0 votes

2 answers

328 views

bit twiddling to right pack bits

I have the following code which right packs every 4 bits of a 64 bit int. This is the naive way of doing it, I am using a lookup table and a loop. I am wondering if there is a faster bit twiddling, ...

Vic C's user avatar

Vic C

asked Feb 20, 2022 at 2:07

7 votes

4 answers

906 views

How to implement SWAR unsigned less-than?

I'm trying to use uint64_t as if it was 8 lanes of uint8_ts; my goal is to implement a lane-by-lane less-than. This operation, given x and y, should produce a result with 0xFF in a lane if the value ...

Koz Ross's user avatar

Koz Ross

3,190

asked Aug 8, 2021 at 2:55

1 vote

2 answers

517 views

How to write a SWAR comparison which puts 0xFF in a lane on matches?

I'm trying to write a SWAR compare-for-equality operation, working on uint64_t pretending to be 8 'lanes' of uint8_t. The closest I've managed to achieve, based on techniques in Hacker's Delight and ...

Koz Ross's user avatar

Koz Ross

3,190

asked Aug 7, 2021 at 20:33

2 votes

2 answers

677 views

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n: #define countless(x,n) \ (((~0UL/255*(127+(n))-((x)&~0UL/255*...

Koz Ross's user avatar

Koz Ross

3,190

asked Jul 8, 2021 at 3:37

15 30 50 per page

2 Next

CollectivesTM on Stack Overflow

How to perform addition of two vectors of 8-bit integers with a single addition in C/C++

What is the fastest way to add 16 small numbers

How can I populate an array with multiple copies of a small value with the fewest operations possible?

Can packing variables or parameters into structures/unions introduce unforseen performance penalties?

Speed up strlen using SWAR in x86-64 assembly

How to check if a register contains a zero byte without SIMD instructions

Add two vectors (uint64_t type) with saturation for each int8_t element

SIMD-within-a-register version of min/max

Fastest way to find 16bit match in a 4 element short array?

Multiplication of two packed signed integers in one

Performantly reverse the order of 16-bit quantities within a 64-bit word

bit twiddling to right pack bits

How to implement SWAR unsigned less-than?

How to write a SWAR comparison which puts 0xFF in a lane on matches?

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Hot Network Questions