My purpose is to calculate the energy spectrum of three-dimensional turbulent flow from three velocity components. The calculation is too expensive. Therefore, I hope the calculation time can be reduced. The above code is a sample of calculating the energy spectrum. I expect to receive some suggestions on parallel computing using OpenMP for the energy E[k]
. The size of i
and j
loops are 1000 000 while that of k
is 100. Some parameters, x, y, z
are velocity components and d1
is velocity magnitude, are created for code structure. I desire that the calculation is designed well on a pc with 40 cores.
This code is compiled as g++ test.cpp -o run -fopenmp
How can i parallelize this following code using openmp:
#include <iostream>
#include <math.h>
#include <vector>
#include <math.h>
#include <cmath>
#include <omp.h>
using namespace std;
int main()
{
int i,j,k;
int np = 1000000; // number of fluid particle
int kmax = 100; // wavenumber
double E[kmax+1] = {0.0}; //energy
// need to be parallelized
for (i = 0; i < np-1; i++){
for (j = i+1; j < np; j++){
double x = sin(double(2*i+j)/(3.0*np));//x,y,z,d1 for test.
double y = sin(double(i+j)/(2.0*np));//x,y,z are velocity
double z = sin(double(i*j)/np/np); //d1 velocity magnitude
double d1 = sqrt(x*x + y*y + z*z);
for (k = 1; k <= kmax; k++){
double d2 = k * d1;
E[k] += x + y + sin(d2) / d2;
}
}
}
return 0;
}
3 Answers 3
In parallel programming, it's often useful to try to minimize the degree to which the threads work with the same data. In this case, you have all the threads doing updates to E
almost constantly throughout execution.
To minimize collisions, I'd allocate a separate instance of E
for each iteration of the loop you decide to execute in parallel, then after you've done the main loops of the simulation, add an extra loop to sum up the values in the per-iteration storage for E
.
For example:
double E[kmax+1];
static double Es[np+1][kmax+1];
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < np - 1; i++) {
for (int j = i + 1; j < np; j++) {
double x = sin(double(2 * i + j) / (3.0 * np)); //x,y,z,d1 for test.
double y = sin(double(i + j) / (2.0 * np)); //x,y,z are velocity
double z = sin(double(i * j) / np / np); //d1 velocity magnitude
double d1 = sqrt(x * x + y * y + z * z);
for (int k = 1; k <= kmax; k++) {
double d2 = k * d1;
// E[k] += x + y + sin(d2) / d2;
Es[i][k] += x + y + sin(d2) / d2;
}
}
}
for (int i = 0; i < np; i++)
for (int j = 0; j < kmax; j++)
E[j] += Es[i][j];
To run a quick test, I reduced np to 10'000. For that number of particles, your code (with the #pragma omp
line added) ran in about 14.7 seconds. Changing it as outlined above reduced that to about 4.1 seconds.
vector
Rather than allocating arrays, I'd use std::vector
to store the data:
std::vector<double> E(kmax + 1);
std::vector<std::vector<double>> Es(np + 1, E);
-
\$\begingroup\$ Oh, it is a very nice solution. I will apply this approach to other codes. Thank you very much. \$\endgroup\$Luc Nguyen– Luc Nguyen2020年10月15日 09:00:32 +00:00Commented Oct 15, 2020 at 9:00
Avoid using namespace std;
If you are coding professionally you probably should get out of the habit of using the using namespace std;
statement. The code will more clearly define where cout
and other identifiers are coming from (std::cin
, std::cout
). As you start using namespaces in your code it is better to identify where each function comes from because there may be function name collisions from different namespaces. The identifiercout
you may override within your own classes, and you may override the operator <<
in your own classes as well. This stack overflow question discusses this in more detail.
Include Files
You are currently including math.h
3 times, it is only necessary to do this once. Since you are using C++, stick with #include <cmath>
.
Declare the Variables as Needed
The code contains this line:
int i,j,k;
These are the variables you are using as loop control variables, they can therefore be declared in the loops themselves:
for (int i = 0; i < np-1; i++){
for (int j = i+1; j < np; j++){
...
}
}
Break the Code Into Functions to Make it Easier to Implement and Debug
Currently the code is all in main()
, as programs grow and features get added the code would be to complex to easily update if it is all in main()
. Also to achieve the ultimate goal of multi-threading for this program you will need to break the code up into multiple functions that can run on a separate core.
void FUNCTION_NAME(int i, int j, int np, double E[])
{
double x = sin(double(2*i+j)/(3.0*np));//x,y,z,d1 for test.
double y = sin(double(i+j)/(2.0*np));//x,y,z are velocity
double z = sin(double(i*j)/np/np); //d1 velocity magnitude
double d1 = sqrt(x*x + y*y + z*z);
for (k = 1; k <= kmax; k++){
double d2 = k * d1;
E[k] += x + y + sin(d2) / d2;
}
}
int main()
{
int np = 1000000; // number of fluid particle
int kmax = 100; // wavenumber
double E[kmax+1] = {0.0}; //energy
// need to be parallelized
for (i = 0; i < np-1; i++){
for (j = i+1; j < np; j++){
FUNCTION_NAME(i, j, np, E[]);
}
}
}
-
\$\begingroup\$ @pacmainbw Thank you very much for your suggestions. Based on your comments, I will optimize the code. \$\endgroup\$Luc Nguyen– Luc Nguyen2020年10月15日 00:51:07 +00:00Commented Oct 15, 2020 at 0:51
So you need to parallelize it and wonder where you should write
#pragma omp parallel for
There are many permutations of this that might be more effective, see manual for more, but try to place it just before the first loop.
i*j
overflows sometimes given thatnp = 1000000
\$\endgroup\$