C# System.Numerics.Vector<T> Clamp

Question 1

The software that I develop uses large floating-point arrays up to the maximum size that can be allocated in C#. I have a large number of algorithms such as convolutions and filters that get executed over those large arrays. I am currently updating as many algorithms as possible to fully threaded and vectorized.

By utilizing the System.Numerics.Vector<T> methods, I am seeing typically a 300%+ performance improvement in many of the algorithms on computers equipped with AVX (where Vector<float>.Count returns 4) and a 600%+ performance improvement in many of the algorithms on computers equipped with AVX2 (where Vector<float>.Count returns 8).

NET Standard 2.1 System.Numerics.Vector<T> here:

https://docs.microsoft.com/en-us/dotnet/api/system.numerics.vector-1?view=netstandard-2.1

One of the functions that I require on a number of the algorithms is to Clamp the array element value to a bounds Minimum or Maximum value after performing some mathematical operation on it. That is of course really easy to do with the single-threaded and multi-threaded algorithms that use standard arithmetic operations.

The issue that I am having is that System.Numerics.Vector<T> doesn't include any kind of Clamp method (Vector2, 3, 4 do). So, for example, if I loop over a large array, modifying the array in Vector<float>.Count chunks, I need to clamp each vector result to a min and/or max bounds prior to writing that vector-sized chunk back to the array.

I tried doing the Clamp in a loop on the array chunk data after the Vector operations, but the performance is abysmal. It is as slow or slower than simply doing the algorithm without vectorization.

Is there any way that I can conceivably improve the performance of this Clamp method?

This is some typical code of how I tried clamping. I fill the vector with a chunk of the array, perform some vector math, write the chunk back to the array, this is all nice and speedy, but then Clamping the array chunk in a loop after just kills the vectorization performance advantage.

int length = array.Length;
int floatcount = System.Numerics.Vector<float>.Count;
for (int i = 0; i < length; i += floatcount)
{
 System.Numerics.Vector<float> arrayvector = new System.Numerics.Vector<float>(array, i);
 arrayvector = System.Numerics.Vector.Multiply<float>(arrayvector, 2.0f);
 // There may be different or multiple vector operations in here.
 arrayvector.CopyTo(array, i);
 // This is how I tried clamping the array data after the vector operation: 
 for (int j = 0; j < floatcount; j++)
 {
 if (array[i + j] > maximimum) { array[i + j] = maximimum; }
 }
}

I'm probably being myopic and missing something really simple. That's what months of 16-hour programming days gets you. ;) Thanks for any insight.

Question 2

This sounds like an interesting question but edited code like generalized code of how I tried clamping usually results in poor reviews because it's very likely that you'll replay I'm already doing this but remove that part etc... so it backfires in most cases. It's always best to post the origial code, however, you can add this edited version as an explanation of the algorithm you are using if you think that it helps.

Question 3

Thanks. What I mean by "generalized code" is that this specific example of the code is doing a simple Vector.Multiply by 2.0f. I actually do a wide range of Add, Subtract, Multiply, and in many cases multiple vector methods, etc. The method of clamping in this code, the "for loop" at the end, is EXACTLY what I tried and it results in poor performance.

Question 4

ok, then I believe it's fine. Do you think you could provide something like a simple console-example where you are calling this from the Main function? In questions about performance it's good when we could actually run and test it ourselves with a profiler or something - it'd be easier to compare the results and see whether the suggested improvement makes it really better ;-]

Question 5

Sure. It won't be until tomorrow though, it's after midnight here. It won't be a menial task since the array length is vector aligned etc. And you would probably want both non-vector and vector code to compare.

Question 6

Coool, no hurry, we'll wait ;-)

Question 7

Have you tried something like this:

Create the following vector:

 System.Numerics.Vector<float> maxima = new System.Numerics.Vector<float>(maximimum);

Then after the multiplication call:

arrayvector = System.Numerics.Vector.Min(arrayvector, maxima);

Here you may have to create an new vector instead of reassigning to arrayvector?

So all in all it ends up like:

 int length = array.Length;
 int floatcount = System.Numerics.Vector<float>.Count;
 System.Numerics.Vector<float> maxima = new System.Numerics.Vector<float>(maximimum);
 for (int i = 0; i < length; i += floatcount)
 {
 System.Numerics.Vector<float> arrayvector = new System.Numerics.Vector<float>(array, i);
 arrayvector = System.Numerics.Vector.Multiply(arrayvector, 2.0f);
 arrayvector = System.Numerics.Vector.Min(arrayvector, maxima);
 // There may be different or multiple vector operations in here.
 arrayvector.CopyTo(array, i);
 }

Disclaimer: I haven't tested the above, so don't hang me if it's not an improvement :-)

Question 8

In addition to this, if you want to implement a double-ended clamp, you can use the min-max definition of a clamp: clamp(a, b, x) = max(a, min(b, x)), where a is the lower bound and b is the upper bound.

Question 9

YES! You are awesome! :) It works perfect. I can't believe that I didn't think of that. I have only done one set of benchmark tests on one system, I will do a lot more profiling before I settle on the code, but initial tests look like this method will work fine.

Question 10

On my first quick profiling test, I am iterating over a floating-point array of 67,108,864 items, 268MB, and only performing a few math functions in the loop. On an AVX equipped system, release build, the standard single-threaded method takes 231310 ticks, while the vectorized method takes 104573 ticks. That is a better than doubling in performance. I will test it against the multi-threaded code and also on my AVX2 systems. AVX2 will hopefully be even better. I am running into memory bandwidth and cache performance issues at these speeds. :)

Question 11

There is one major issue that I have found in my work with the System.Numerics.Vectors methods, is never use Multiply<T>(Vector<T>, T), it is horribly slow. I don't know what they are doing in the code but it is the worst performing method I have tried. It is multiple times slower than non-vectored Multiply for each float. So I always use Multiply<T>(Vector<T>, Vector<T>) instead.

Question 12

@deegee it probably creates a new vector very time, but I can't seem to find the source code for these things...

user73941user73941 · Accepted Answer · 2019-05-10 08:41:45Z

3

\$\begingroup\$

Have you tried something like this:

Create the following vector:

 System.Numerics.Vector<float> maxima = new System.Numerics.Vector<float>(maximimum);

Then after the multiplication call:

arrayvector = System.Numerics.Vector.Min(arrayvector, maxima);

Here you may have to create an new vector instead of reassigning to arrayvector?

So all in all it ends up like:

 int length = array.Length;
 int floatcount = System.Numerics.Vector<float>.Count;
 System.Numerics.Vector<float> maxima = new System.Numerics.Vector<float>(maximimum);
 for (int i = 0; i < length; i += floatcount)
 {
 System.Numerics.Vector<float> arrayvector = new System.Numerics.Vector<float>(array, i);
 arrayvector = System.Numerics.Vector.Multiply(arrayvector, 2.0f);
 arrayvector = System.Numerics.Vector.Min(arrayvector, maxima);
 // There may be different or multiple vector operations in here.
 arrayvector.CopyTo(array, i);
 }

Disclaimer: I haven't tested the above, so don't hang me if it's not an improvement :-)

Share

edited May 10, 2019 at 10:42

answered May 10, 2019 at 8:41

user73941user73941

\$\endgroup\$

6

2

\$\begingroup\$ In addition to this, if you want to implement a double-ended clamp, you can use the min-max definition of a clamp: clamp(a, b, x) = max(a, min(b, x)), where a is the lower bound and b is the upper bound. \$\endgroup\$

EvilTak
– EvilTak

2019年05月10日 13:06:52 +00:00
Commented May 10, 2019 at 13:06
1

\$\begingroup\$ YES! You are awesome! :) It works perfect. I can't believe that I didn't think of that. I have only done one set of benchmark tests on one system, I will do a lot more profiling before I settle on the code, but initial tests look like this method will work fine. \$\endgroup\$

deegee
– deegee

2019年05月10日 21:49:47 +00:00
Commented May 10, 2019 at 21:49
1

\$\begingroup\$ On my first quick profiling test, I am iterating over a floating-point array of 67,108,864 items, 268MB, and only performing a few math functions in the loop. On an AVX equipped system, release build, the standard single-threaded method takes 231310 ticks, while the vectorized method takes 104573 ticks. That is a better than doubling in performance. I will test it against the multi-threaded code and also on my AVX2 systems. AVX2 will hopefully be even better. I am running into memory bandwidth and cache performance issues at these speeds. :) \$\endgroup\$

deegee
– deegee

2019年05月10日 21:54:39 +00:00
Commented May 10, 2019 at 21:54
1

\$\begingroup\$ There is one major issue that I have found in my work with the System.Numerics.Vectors methods, is never use Multiply<T>(Vector<T>, T), it is horribly slow. I don't know what they are doing in the code but it is the worst performing method I have tried. It is multiple times slower than non-vectored Multiply for each float. So I always use Multiply<T>(Vector<T>, Vector<T>) instead. \$\endgroup\$

deegee
– deegee

2019年05月10日 21:59:22 +00:00
Commented May 10, 2019 at 21:59
1

\$\begingroup\$ @deegee it probably creates a new vector very time, but I can't seem to find the source code for these things... \$\endgroup\$

VisualMelon
– VisualMelon

2019年05月11日 10:05:32 +00:00
Commented May 11, 2019 at 10:05

| Show 1 more comment

Stack Exchange Network

C# System.Numerics.Vector<T> Clamp

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C# System.Numerics.Vector<T> Clamp

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions