6
\$\begingroup\$

I have to implement a simple program in SSE, and I don't know if I have done it in the right way (it's my first program in SSE).

This is the C++ program:

double distance (int x1, int x2, int y1, int y2)
{
 double x = abs(x2 - x1);
 double y = abs(y2 - y1);
 double dist = sqrt(x*x + y*y);
 return dist;
}

And this is the program in SSE instructions:

__m128d vx1 _mm_load_pd(x1); // load x1 and x2
__m128d vx2 = _mm_load_pd(x2);
__m128d vdiff = _mm_sub_pd(vx2, vx1); // x2 - x1
__m128d vnegdiff = mm_sub_pd(_mm_set1_pd(0.0), vdiff); // 0.0 - diff
__m128d vabsX = _mm_max_pd(vdiff, vnegdiff); // abs = max(diff, - diff)
__m128d vy1 = _mm_load_pd(y1); 
__m128d vy2 = _mm_load_pd(y2);
__m128d vdiff2 = _mm_sub_pd(vy2, vy1); 
__m128d vnegdiff2 = mm_sub_pd(_mm_set1_pd(0.0), vdiff2); 
__m128d vabsY = _mm_max_pd(vdiff2, vnegdiff2);
__m128d absX = _mm_mul_pd(vabsX,vabsX); //x*x
__m128d absY = _mm_mul_pd(vabsY,vabsY); //y*y
__m128d suma = _mm_add_pd(absX, absY); // x^2 + y^2
__m128d square = _mm_sqrt_pd(suma); // square root x^2 + y^2
return square;
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 19, 2014 at 21:31
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

Your SSE code looks like a reasonably accurate translation of your original C, but that C code doesn't immediately jump out as the best way to do things. In particular, I don't see where you gain anything by taking the absolute value before you do your multiplication. Given that you're squaring the value immediately afterwards anyway, it would appear simpler to just leave it negative if that's what it happens to be. Lacking a specific reason to do otherwise, I'd start with something simpler, like this:

double x = x1 - x2;
double y = y1 - y2;
double dist = sqrt(x * x + y * y);

Then I'd convert that to SSE instructions.

Especially if you're concerned with speed, you might want to consider using an alternative to the Pythagorean theorem to compute the hypotenuse though. I posted some C for one possibility in a previous answer.

Sep Roland
4,78317 silver badges28 bronze badges
answered Mar 23, 2014 at 4:27
\$\endgroup\$
3
\$\begingroup\$

Without SSE

  • There is no need to take the absolute value. If X is real, then X^2 is always positive.
  • Consider using hypot(x,y) rather than sqrt(xx+yy). It is simpler and more accurate.

My suggestions:

double distanceSqrt(int x1, int x2, int y1, int y2)
{
 double dx = x2 - x1;
 double dy = y2 - y1;
 return sqrt(dx*dx + dy*dy);
}
double distanceHypot(int x1, int x2, int y1, int y2)
{
 return hypot(x2 - x1, y2 - y1);
}

SSE

  • What version of SSE are you targeting?

Rather than trying to correct your SSE version, I will build a new version. First I want to point out a few errors to try and clear up some misconceptions. Let's look at your first line of code:

__m128d vx1 = _mm_load_pd(x1);

A double is 64-bits. A __m128d is 128-bits. A __m128d holds two doubles. _mm_load_pd() takes a pointer to an array of doubles and loads the first two into a __m128d. You defined x1 to be of type int in your distance() function. The pointer passed to _mm_load_pd() must be 16 byte aligned or your program will crash. Another snippet:

__m128d absX = _mm_mul_pd(vabsX,vabsX);
__m128d absY = _mm_mul_pd(vabsY,vabsY);

You only need to call _mm_mul_pd() once. A __m128 holds two doubles so _mm_mul_pd() actually performs two multiplications.

//a={a1,a2}
//b={b1,b2}
//{a1*b1,a2*b2}=_mm_mul_pd(a,b)
double distance(int x1, int x2, int y1, int y2)
{
 __declspec(align(16)) double diff[2] = {x2 - x1, y2 - y1};
 __m128d temp = _mm_load_pd(diff);
 temp = _mm_mul_pd(temp, temp);
 //...
}

In this example the variable temp holds diff[0]*diff[0] in the low 8 bytes and diff[1]*diff[1] in the high 8 bytes.

Let's build a working SSE version of the distance() function.

double distanceSSE(int x1, int x2, int y1, int y2)
{

The first thing we need to do load the integers into SSE variables. A _m128i can hold four 32-bit integers, but we only need two.

 __m128i point1 = _mm_set_epi32(0,0,x1,y1);
 __m128i point2 = _mm_set_epi32(0,0,x2,y2);

We eventually need to convert these integers to doubles. We will wait because integer math is easier than double math. Also, the next operation is subtraction and there is no loss of information by sticking with integers. Find the distance on each plane.

 __m128i diff = _mm_sub_epi32(point2,point1);

Next is multiplication which we can do with _mm_mul_epi32() without loss of information, but that creates two 64-bit integers. Converting 64-bit integers to doubles is harder than 32-bit integers to doubles so we will convert them now.

 __m128d temp = _mm_cvtepi32_pd(diff);

The variable temp now contains two doubles {y2-y1,x2-x1}. Let's square them.

 temp = _mm_mul_pd(temp,temp);

Now we need to add the two doubles in temp together before we take the square root. This time we use a horizontal add. We need a second argument so we just add a dummy.

 temp = _mm_hadd_pd(temp,_mm_setzero_pd());

The square root.

 temp = _mm_sqrt_pd(temp);

Finally, we copy the low double from temp into a double.

 double ret;
 _mm_storel_pd(&ret,temp);
 return ret;
}

Put it all together:

double distanceSSE(int x1, int x2, int y1, int y2)
{
 __m128i point1 = _mm_set_epi32(0, 0, x1, y1);
 __m128i point2 = _mm_set_epi32(0, 0, x2, y2);
 __m128i diff = _mm_sub_epi32(point2, point1);
 __m128d temp = _mm_cvtepi32_pd(diff);
 temp = _mm_mul_pd(temp, temp);
 temp = _mm_hadd_pd(temp, _mm_setzero_pd());
 temp = _mm_sqrt_pd(temp);
 double ret;
 _mm_storel_pd(&ret, temp);
 return ret;
}

This is a simple method to find the distance but not necessarily the best. For more information on the drawbacks of this method see Hypot.

Sep Roland
4,78317 silver badges28 bronze badges
answered Mar 23, 2014 at 11:19
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.