Using OpenMP, is it correct to parallelize a for
loop inside a function "func" as follows?
void func(REAL coeff, DATAMPOT *dmp, int a, int la, int b, int lb, REAL L)
{
int i,j,k;
REAL dx,dy,dz;
REAL dx2,dy2,dz2;
REAL r;
#pragma omp parallel for default(shared) private(k,i,j,dx,dy,dz,dx2,dy2,dz2,r) reduction(+:deltaE)
for(k=0; k<la*lb; ++k){
j=k/la+b;
i=k%la+a;
dx=fabs(part[i].x-part[j].x);
dy=fabs(part[i].y-part[j].y);
dz=fabs(part[i].z-part[j].z);
dx2=(dx<0.5?dx*dx:(1-dx)*(1-dx));
dy2=(dy<0.5?dy*dy:(1-dy)*(1-dy));
dz2=(dz<0.5?dz*dz:(1-dz)*(1-dz));
r=L*sqrt(dx2+dy2+dz2);
deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1?
mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX):
(dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp ) );
}
}
Where:
REAL
isdouble
(#define REAL double
)DATAMPOT *dmp
is a pointer to a struct containing (among others) some pointers to functions, such aspot[0]
part
is a global array ofstruct
deltaE
(variable for summation-reduction) is aREAL
global variable
I know that, for a correctness, a special treatment of function rand()
is also required;
but apart from that, are there some other important (conceptual) correction to do on the above parallelization? Which is limited at only one directive row?
2 Answers 2
There is nothing wrong with your code, but it can be improved somehow.
First, automatic variables, defined in a scope that is outer to the parallel region, are automatically shared
. Therefore the default(shared)
clause is redundant.
Second, the loop counter k
has predetermined sharing class of private
- you can safely omit it. Also you should declare all variables in the scope where they are used. In your case all variables except k
can be declared in the parallel region. Such variables have predetermined sharing class of private
.
If you follow both of the above points, your OpenMP directive will be greatly simplified:
void func(REAL coeff, DATAMPOT *dmp, int a, int la, int b, int lb, REAL L)
{
int k;
#pragma omp parallel for reduction(+:deltaE)
for (k = 0; k < la*lb; ++k) {
int j = k/la+b;
int i = k%la+a;
REAL dx = fabs(part[i].x-part[j].x);
REAL dy = fabs(part[i].y-part[j].y);
REAL dz = fabs(part[i].z-part[j].z);
REAL dx2 = (dx<0.5?dx*dx:(1-dx)*(1-dx));
REAL dy2 = (dy<0.5?dy*dy:(1-dy)*(1-dy));
REAL dz2 = (dz<0.5?dz*dz:(1-dz)*(1-dz));
REAL r = L*sqrt(dx2+dy2+dz2);
DATAMPOT *ptr = dmp + NSPES*part[i].s+part[j].s;
deltaE += coeff*(ptr->npot>1 ?
mpot(r,ptr,((REAL)rand())/RAND_MAX) :
ptr->pot[0](r,ptr->dp));
}
}
If you can use C99 constructs in your code, then you can even move the declaration of k
inside the for
loop, i.e.:
#pragma omp parallel for reduction(+:deltaE)
for (int k = 0; k < la*lb; k++) {
...
}
Also make sure that none of the functions called inside the loop have visible side effects, i.e. they don't modify some shared global state in an unexpected and unsynchronised way.
-
\$\begingroup\$ Many thanks Hristo Iliev. What about the use of rand() function inside the parallel region? May be more correct to use something like this: #pragma omp parallel{ srand(int(time(NULL)) ^ omp_get_thread_num()); #pragma omp for /*for loop ...*/ } \$\endgroup\$micheletuttafesta– micheletuttafesta2013年05月23日 16:23:35 +00:00Commented May 23, 2013 at 16:23
-
\$\begingroup\$ I assumed that you know how to threat it.
rand()
is not thread-safe as it modifies a global RNG state. You should use a separate PRNG for each thread (e.g. use the re-entrant version where the state is supplied explicitly) or serialise the access to the PRNG in a critical section. Proper initialisation and decorrelation of multiple PRNGs is a completely separate (research) topic. You could use current time + some large prime number times the thread ID. \$\endgroup\$Hristo Iliev– Hristo Iliev2013年05月23日 16:57:41 +00:00Commented May 23, 2013 at 16:57
Try to come up with a better function name than
func()
. You are the one who understands this code the most, so you should know how to give it a relevant name.Try not to use single-character names, unless they're for a simple
for
-loop. It's hard to tell what they're for, especially without comments. If this ends up making the#pragma
line even longer, then you can just split it into separate lines with a\
at the end of each line.It's a little hard to read lines lacking whitespace:
dx2=(dx<0.5?dx*dx:(1-dx)*(1-dx));
It's difficult to see the entire ternary statement, so add more whitespace:
dx2 = (dx < 0.5 ? dx*dx : (1-dx)*(1-dx));
This concept should be applied everywhere, especially each statement in the loop.
Going back to the ternary, consider using an
if
/else
here instead. Sure, this is a short ternary, but that doesn't mean it should always be used, especially if it's harder to read carefully.if (dx < 0.5) { dx2 = dx * dx; } else { dx2 = (1-dx) * (1-dx); }
This formatting is a bit hard to read:
deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1? mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX): (dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp ) );
At least reformat it to something like this:
deltaE += coeff*((dmp+NSPES*part[i].s+part[j].s)->npot>1 ? mpot(r,dmp+NSPES*part[i].s+part[j].s,((REAL)rand())/RAND_MAX) : (dmp+NSPES*part[i].s+part[j].s)->pot[0](r,(dmp+NSPES*part[i].s+part[j].s)->dp));
However, this may still be too lengthy for a ternary. You could still use a plain
if
/else
if it would help more with readability.
-
\$\begingroup\$ Nice, but I would firmly recommend an if-else instead of that ternary, and put spaces around operators. \$\endgroup\$janos– janos2014年11月15日 07:18:36 +00:00Commented Nov 15, 2014 at 7:18
-
\$\begingroup\$ @janos: Yeah, I could add that. I was mostly using that as an example of lack of whitespace, but there were other instances of it. \$\endgroup\$Jamal– Jamal2014年11月15日 07:20:41 +00:00Commented Nov 15, 2014 at 7:20