Skip to main content
Code Review

Questions tagged [simd]

Single Instruction, Multiple Data describes CPU instructions that process many operands in parallel.

Filter by
Sorted by
Tagged with
4 votes
0 answers
92 views

16x16 integer matrix transpose using SSE2 intrinsics in C

I was inspired by this and this to make a C function that would take an array of 16 __m128i, treat it as a matrix of 16x16 ...
8 votes
1 answer
399 views

SIMD Softmax implementation

I am learning SIMD and looking for feedback. This is column-wise Softmax for matrices stored in row-major format. Note that matrices come from outside so padding or dimensions being power of 2 can't ...
Eugene's user avatar
  • 183
4 votes
2 answers
165 views

C - SIMD Code to invert a transformation matrix

I am writing a maths library for a raytracer project, and so I'm trying to make my heavy operations (like matrix inverse) more optimised. After doing some research, I discovered this trick to invert a ...
2 votes
3 answers
237 views

Optimizing a for loop for changing pixels values using lookup table

I tried to parallelize the loop, and I got a good result but still not enough. This post is a follow up to a recent one where I optimized other parts of the code using a lookup table and spacial and ...
Ja_cpp's user avatar
  • 433
5 votes
1 answer
500 views

High Performance Matrix Multiplication is not very high speed, why?

I would appreciate a review of the following Rust implementation of high performance matrix multiplication. After reviewing available literature, including Anatomy of High Performance Matrix ...
Ana's user avatar
Ana
  • 129
7 votes
1 answer
334 views

AVX2 8x8 Float Matrix Multiply in Rust

I'm interested in a fast 8x8 32-bit float matrix multiply in Rust, assuming availability of AVX2. After learning about the AVX2 intrinsics, here is what I came up with: ...
Ana's user avatar
Ana
  • 129
2 votes
1 answer
96 views

Finding the kth smallest number where all (hexadecimal) digits are different

I'm mostly trying to understand why the simpler char array mask below (to track which digits have been already used) is much ...
1 vote
2 answers
433 views

Count the number of mismatches between two arrays

This function may compute the amount of unequal elements of two char-arrays of the length n: ...
5 votes
1 answer
950 views

Speed up strlen using SWAR in x86-64 assembly

The asm function strlen receives the link to a string as a char - Array. To do so, the function may use SWAR on general purpose register, but without using ...
2 votes
1 answer
150 views

SIMD Vectorizing C Function Generating Floating-point Range

I have a C function that generates a range from the given start, step_size and end values. I ...
4 votes
2 answers
506 views

Search function using SIMD

I wrote a search function, similar to std::find, that uses SIMD instructions. Since I am new to SIMD, I would appreciate comments on other SIMD instructions I have ...
1 vote
1 answer
356 views

Implementing a 1D Convolution SIMD Friendly in Julia

I want to implement a 1D convolution in Julia using the direct calculation since the conv() function in DSP.jl uses DFT (fft) ...
Royi's user avatar
  • 582
1 vote
2 answers
538 views

Bilinear interpolation optimized using intrinsics

I have found that a bottleneck of the OpenCV application I use is the bilinear interpolation, so I have tried to optimize it. The bilinear interpolation is in 8D space, so each "color" is an ...
rafoo's user avatar
  • 335
4 votes
1 answer
1k views

C++ Binary search using SIMD

Recently I found that the binary search (std::ranges::lower_bound and std::ranges::upper_bound) is the main bottleneck in my ...
3 votes
1 answer
1k views

Sum two vectors in x86 assembly

I recently made a program with C++ and ASM. Can anyone help me make this code a more efficient one, in the ASM part or both. I would really appreciate it because I don't know every ASM instruction and ...

15 30 50 per page
1
2 3 4

AltStyle によって変換されたページ (->オリジナル) /