OpenMP parallelization of a for loop with function calls

Question 1

Using OpenMP, is it correct to parallelize a for loop inside a function "func" as follows?

void func(REAL coeff, DATAMPOT *dmp, int a, int la, int b, int lb, REAL L)
{
 int i,j,k;
 REAL dx,dy,dz;
 REAL dx2,dy2,dz2;
 REAL r;
 #pragma omp parallel for default(shared) private(k,i,j,dx,dy,dz,dx2,dy2,dz2,r) reduction(+:deltaE)
 for(k=0; k<la*lb; ++k){
 j=k/la+b;
 i=k%la+a;
 dx=fabs(part[i].x-part[j].x);
 dy=fabs(part[i].y-part[j].y);
 dz=fabs(part[i].z-part[j].z);
 dx2=(dx<0.5?dx*dx:(1-dx)*(1-dx));
 dy2=(dy<0.5?dy*dy:(1-dy)*(1-dy));
 dz2=(dz<0.5?dz*dz:(1-dz)*(1-dz));
 r=L*sqrt(dx2+dy2+dz2);
 deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1?
mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX):
(dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp ) );
 }
}

Where:

REAL is double (#define REAL double)
DATAMPOT *dmp is a pointer to a struct containing (among others) some pointers to functions, such as pot[0]
part is a global array of struct
deltaE (variable for summation-reduction) is a REAL global variable

I know that, for a correctness, a special treatment of function rand() is also required; but apart from that, are there some other important (conceptual) correction to do on the above parallelization? Which is limited at only one directive row?

Question 2

There is nothing wrong with your code, but it can be improved somehow.

First, automatic variables, defined in a scope that is outer to the parallel region, are automatically shared. Therefore the default(shared) clause is redundant.

Second, the loop counter k has predetermined sharing class of private - you can safely omit it. Also you should declare all variables in the scope where they are used. In your case all variables except k can be declared in the parallel region. Such variables have predetermined sharing class of private.

If you follow both of the above points, your OpenMP directive will be greatly simplified:

void func(REAL coeff, DATAMPOT *dmp, int a, int la, int b, int lb, REAL L)
{
 int k;
 #pragma omp parallel for reduction(+:deltaE)
 for (k = 0; k < la*lb; ++k) {
 int j = k/la+b;
 int i = k%la+a;
 REAL dx = fabs(part[i].x-part[j].x);
 REAL dy = fabs(part[i].y-part[j].y);
 REAL dz = fabs(part[i].z-part[j].z);
 REAL dx2 = (dx<0.5?dx*dx:(1-dx)*(1-dx));
 REAL dy2 = (dy<0.5?dy*dy:(1-dy)*(1-dy));
 REAL dz2 = (dz<0.5?dz*dz:(1-dz)*(1-dz));
 REAL r = L*sqrt(dx2+dy2+dz2);
 DATAMPOT *ptr = dmp + NSPES*part[i].s+part[j].s;
 deltaE += coeff*(ptr->npot>1 ?
 mpot(r,ptr,((REAL)rand())/RAND_MAX) :
 ptr->pot[0](r,ptr->dp));
 }
}

If you can use C99 constructs in your code, then you can even move the declaration of k inside the for loop, i.e.:

#pragma omp parallel for reduction(+:deltaE)
for (int k = 0; k < la*lb; k++) {
 ...
}

Also make sure that none of the functions called inside the loop have visible side effects, i.e. they don't modify some shared global state in an unexpected and unsynchronised way.

Question 3

Many thanks Hristo Iliev. What about the use of rand() function inside the parallel region? May be more correct to use something like this: #pragma omp parallel{ srand(int(time(NULL)) ^ omp_get_thread_num()); #pragma omp for /*for loop ...*/ }

Question 4

I assumed that you know how to threat it. rand() is not thread-safe as it modifies a global RNG state. You should use a separate PRNG for each thread (e.g. use the re-entrant version where the state is supplied explicitly) or serialise the access to the PRNG in a critical section. Proper initialisation and decorrelation of multiple PRNGs is a completely separate (research) topic. You could use current time + some large prime number times the thread ID.

Question 5

Try to come up with a better function name than func(). You are the one who understands this code the most, so you should know how to give it a relevant name.
Try not to use single-character names, unless they're for a simple for-loop. It's hard to tell what they're for, especially without comments. If this ends up making the #pragma line even longer, then you can just split it into separate lines with a \ at the end of each line.
It's a little hard to read lines lacking whitespace:
```
dx2=(dx<0.5?dx*dx:(1-dx)*(1-dx));
```
It's difficult to see the entire ternary statement, so add more whitespace:
```
dx2 = (dx < 0.5 ? dx*dx : (1-dx)*(1-dx));
```
This concept should be applied everywhere, especially each statement in the loop.

Going back to the ternary, consider using an if/else here instead. Sure, this is a short ternary, but that doesn't mean it should always be used, especially if it's harder to read carefully.
```
if (dx < 0.5)
{
 dx2 = dx * dx;
}
else
{
 dx2 = (1-dx) * (1-dx);
}
```

This formatting is a bit hard to read:

 deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1?
mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX):
(dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp ) );

At least reformat it to something like this:

deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1
 ? mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX)
 : (dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp));

However, this may still be too lengthy for a ternary. You could still use a plain if/else if it would help more with readability.

Question 6

Nice, but I would firmly recommend an if-else instead of that ternary, and put spaces around operators.

Question 7

@janos: Yeah, I could add that. I was mostly using that as an example of lack of whitespace, but there were other instances of it.

Hristo Iliev Hristo IlievHristo Iliev 1462 bronze badges · Accepted Answer · 2013-05-23 13:16:21Z

There is nothing wrong with your code, but it can be improved somehow.

First, automatic variables, defined in a scope that is outer to the parallel region, are automatically shared. Therefore the default(shared) clause is redundant.

Second, the loop counter k has predetermined sharing class of private - you can safely omit it. Also you should declare all variables in the scope where they are used. In your case all variables except k can be declared in the parallel region. Such variables have predetermined sharing class of private.

If you follow both of the above points, your OpenMP directive will be greatly simplified:

void func(REAL coeff, DATAMPOT *dmp, int a, int la, int b, int lb, REAL L)
{
 int k;
 #pragma omp parallel for reduction(+:deltaE)
 for (k = 0; k < la*lb; ++k) {
 int j = k/la+b;
 int i = k%la+a;
 REAL dx = fabs(part[i].x-part[j].x);
 REAL dy = fabs(part[i].y-part[j].y);
 REAL dz = fabs(part[i].z-part[j].z);
 REAL dx2 = (dx<0.5?dx*dx:(1-dx)*(1-dx));
 REAL dy2 = (dy<0.5?dy*dy:(1-dy)*(1-dy));
 REAL dz2 = (dz<0.5?dz*dz:(1-dz)*(1-dz));
 REAL r = L*sqrt(dx2+dy2+dz2);
 DATAMPOT *ptr = dmp + NSPES*part[i].s+part[j].s;
 deltaE += coeff*(ptr->npot>1 ?
 mpot(r,ptr,((REAL)rand())/RAND_MAX) :
 ptr->pot[0](r,ptr->dp));
 }
}

If you can use C99 constructs in your code, then you can even move the declaration of k inside the for loop, i.e.:

#pragma omp parallel for reduction(+:deltaE)
for (int k = 0; k < la*lb; k++) {
 ...
}

Also make sure that none of the functions called inside the loop have visible side effects, i.e. they don't modify some shared global state in an unexpected and unsynchronised way.

Many thanks Hristo Iliev. What about the use of rand() function inside the parallel region? May be more correct to use something like this: #pragma omp parallel{ srand(int(time(NULL)) ^ omp_get_thread_num()); #pragma omp for /*for loop ...*/ }
I assumed that you know how to threat it. rand() is not thread-safe as it modifies a global RNG state. You should use a separate PRNG for each thread (e.g. use the re-entrant version where the state is supplied explicitly) or serialise the access to the PRNG in a critical section. Proper initialisation and decorrelation of multiple PRNGs is a completely separate (research) topic. You could use current time + some large prime number times the thread ID.

Stack Exchange Network

OpenMP parallelization of a for loop with function calls

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

OpenMP parallelization of a for loop with function calls

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions