Questions tagged [sse]
Streaming SIMD Extensions (SSE) is the first generation of SIMD Intel's instruction sets available on modern x86-compatible CPUs. SSE offers single-precision floating point arithmetic and integer arithmetic (excluding division) and logical operations on packed or single operands of sizes from 8 to 64 bits.
29 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
4
votes
0
answers
77
views
16x16 integer matrix transpose using SSE2 intrinsics in C
I was inspired by this and this to make a C function that would take an array of 16 __m128i, treat it as a matrix of 16x16 ...
7
votes
1
answer
328
views
AVX2 8x8 Float Matrix Multiply in Rust
I'm interested in a fast 8x8 32-bit float matrix multiply in Rust, assuming availability of AVX2. After learning about the AVX2 intrinsics, here is what I came up with:
...
1
vote
2
answers
347
views
Count the number of mismatches between two arrays
This function may compute the amount of unequal elements of two char-arrays of the length n:
...
1
vote
1
answer
241
views
Insert an array[4] to an array[8] (C++, SSE)
I have this code to get audio output levels in dB to an array (peak_dB[8]) to be used in real time peakmeter:
...
5
votes
2
answers
526
views
SSE Assembly vs GCC Compiler - Dot Product
I am currently taking an introductory course in computer architecture.
Our goal was to write a dot-product function in x86 Assembly which would use SSE and SIMD (without AVX).
I am not to that ...
6
votes
0
answers
119
views
4×4 cofactor in SSE
The cofactor of a ×ばつ4 matrix can be used to convert a "regular geometry" matrix into the matrix that transforms the normals. It's an alternative to the common inverse-transpose pattern. In this post I ...
3
votes
1
answer
107
views
Generic pixel class to seamlessly alpha-blend and convert between different pixel structure layouts
Does what it says in the title. I just finished this and wanted to share with someone.
Looking for possible optimizations, bugs (most of it is tested to work) or any constructive criticism.
...
5
votes
1
answer
213
views
Fast Hardy-Weinberg equilibrium simulation
I was very bored over one of my breaks this year, so I built a Hardy-Weinberg equilibrium simulator for two unrelated alleles of the same gene. Hardy-Weinberg equilibrium is when there is no evolution,...
7
votes
1
answer
2k
views
SIMD memcpy assembler implementation
I am fairly rusty with assembler, let alone the AT&T syntax. I would appreciate it if someone with more experience could please review the following memcpy implementation. Note that this will only ...
1
vote
1
answer
612
views
Fast affine transformations of many 3D points by one 3×4 matrix
I wrote a function to batch-transform 3D vectors by a single 3x4 matrix using SSE2:
...
7
votes
3
answers
3k
views
Converting Array of Floats to UINT8 (`char`) or UINT16 (`unsigned short`) Using SSE4
The problem is given image in 32 Bit Floating Point Format (float) how to convert it to UINT8 (char) or UNIT16 (...
1
vote
1
answer
924
views
Finding the Minimum and Maximum Value in an Image
Given an image which is padded to support aligning (SSE) I need to find its minimum and maximum value as fast as possible.
Mind you the padded values are not defined and can't be assumed to have ...
10
votes
2
answers
17k
views
AVX SIMD in matrix multiplication
I have coded the following C function for multiplying two NxN matrices and using AVX vectors to speed up the calculation. It works but the speedup is not what is to be expected(some scalar code is ...
10
votes
2
answers
2k
views
Vectorized and Multi Threaded Image Convolution
I created code for Image Convolution. The code is in my Image Convolution GitHub Repository.
It includes the case for arbitrary Image Convolution and for Separable Kernel Convolution.
The code is a ...
11
votes
3
answers
614
views
SSE loop to walk likely primes
This is a continuation of a discussion that was started here. While there are some interesting points there about instruction timing and latency, it is not necessary to read that Question to ...