Calculation of Energy Spectrum using C++

Question 1

My purpose is to calculate the energy spectrum of three-dimensional turbulent flow from three velocity components. The calculation is too expensive. Therefore, I hope the calculation time can be reduced. The above code is a sample of calculating the energy spectrum. I expect to receive some suggestions on parallel computing using OpenMP for the energy E[k]. The size of i and j loops are 1000 000 while that of k is 100. Some parameters, x, y, z are velocity components and d1 is velocity magnitude, are created for code structure. I desire that the calculation is designed well on a pc with 40 cores.

This code is compiled as g++ test.cpp -o run -fopenmp

How can i parallelize this following code using openmp:

#include <iostream>
#include <math.h>
#include <vector>
#include <math.h> 
#include <cmath> 
#include <omp.h>
using namespace std;
int main()
{
 int i,j,k;
 int np = 1000000; // number of fluid particle
 int kmax = 100; // wavenumber
 double E[kmax+1] = {0.0}; //energy
 // need to be parallelized
 for (i = 0; i < np-1; i++){
 for (j = i+1; j < np; j++){
 double x = sin(double(2*i+j)/(3.0*np));//x,y,z,d1 for test.
 double y = sin(double(i+j)/(2.0*np));//x,y,z are velocity
 double z = sin(double(i*j)/np/np); //d1 velocity magnitude
 double d1 = sqrt(x*x + y*y + z*z);
 for (k = 1; k <= kmax; k++){
 double d2 = k * d1;
 E[k] += x + y + sin(d2) / d2;
 }
 }
 }
 
 
 return 0;
}

Question 2

Welcome to the Code Review site where we review working code and provide suggestions on how to improve that code, such as optimization. What we can't do is help you write new code or implement feature requests such as changing the code to use multi-threading. In the guide on asking questions that @AryanParekh suggested you read it explains that code that is not working as expected is off-topic and feature requests are code that that is not working as expected. Other than that the question is very well written.

Question 3

I don't think this code works as intended, because i*j overflows sometimes given that np = 1000000

Question 4

There's a fine line between What about coding X thus / How to best code X vs. How to achieve X. The question (starting with no OMP pragma) makes me think the latter intended, & Surt's answer helpful, if off CR@SE's topic. (Accepting pacmaninbw's answer instils doubt.)

Question 5

I don't understand your algorithm as a FFT is the typical method for computing spectrums. Also this calculation looks incorrect because sin is only calculated from [0 .. 1] radians instead of 0 - 2*pi radians.

Question 6

The FFT algorithm reduces the number of operations to N Log N from N^2. When you have millions of samples this is huge. A FFT will provide far greater speedup than anything else. Google 3D FFT.

Question 7

In parallel programming, it's often useful to try to minimize the degree to which the threads work with the same data. In this case, you have all the threads doing updates to E almost constantly throughout execution.

To minimize collisions, I'd allocate a separate instance of E for each iteration of the loop you decide to execute in parallel, then after you've done the main loops of the simulation, add an extra loop to sum up the values in the per-iteration storage for E.

For example:

double E[kmax+1];
static double Es[np+1][kmax+1];
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < np - 1; i++) {
 for (int j = i + 1; j < np; j++) {
 double x = sin(double(2 * i + j) / (3.0 * np)); //x,y,z,d1 for test.
 double y = sin(double(i + j) / (2.0 * np)); //x,y,z are velocity
 double z = sin(double(i * j) / np / np); //d1 velocity magnitude
 double d1 = sqrt(x * x + y * y + z * z);
 for (int k = 1; k <= kmax; k++) {
 double d2 = k * d1;
 // E[k] += x + y + sin(d2) / d2;
 Es[i][k] += x + y + sin(d2) / d2;
 }
 }
}
for (int i = 0; i < np; i++)
 for (int j = 0; j < kmax; j++)
 E[j] += Es[i][j];

To run a quick test, I reduced np to 10'000. For that number of particles, your code (with the #pragma omp line added) ran in about 14.7 seconds. Changing it as outlined above reduced that to about 4.1 seconds.

vector

Rather than allocating arrays, I'd use std::vector to store the data:

std::vector<double> E(kmax + 1);
std::vector<std::vector<double>> Es(np + 1, E);

Question 8

Oh, it is a very nice solution. I will apply this approach to other codes. Thank you very much.

Question 9

Avoid `using namespace std;`

If you are coding professionally you probably should get out of the habit of using the using namespace std; statement. The code will more clearly define where cout and other identifiers are coming from (std::cin, std::cout). As you start using namespaces in your code it is better to identify where each function comes from because there may be function name collisions from different namespaces. The identifiercout you may override within your own classes, and you may override the operator << in your own classes as well. This stack overflow question discusses this in more detail.

Include Files

You are currently including math.h 3 times, it is only necessary to do this once. Since you are using C++, stick with #include <cmath>.

Declare the Variables as Needed

The code contains this line:

 int i,j,k;

These are the variables you are using as loop control variables, they can therefore be declared in the loops themselves:

 for (int i = 0; i < np-1; i++){
 for (int j = i+1; j < np; j++){
 ...
 }
 }

Break the Code Into Functions to Make it Easier to Implement and Debug

Currently the code is all in main(), as programs grow and features get added the code would be to complex to easily update if it is all in main(). Also to achieve the ultimate goal of multi-threading for this program you will need to break the code up into multiple functions that can run on a separate core.

void FUNCTION_NAME(int i, int j, int np, double E[])
{
 double x = sin(double(2*i+j)/(3.0*np));//x,y,z,d1 for test.
 double y = sin(double(i+j)/(2.0*np));//x,y,z are velocity
 double z = sin(double(i*j)/np/np); //d1 velocity magnitude
 double d1 = sqrt(x*x + y*y + z*z);
 for (k = 1; k <= kmax; k++){
 double d2 = k * d1;
 E[k] += x + y + sin(d2) / d2;
 }
}
int main()
{
 int np = 1000000; // number of fluid particle
 int kmax = 100; // wavenumber
 double E[kmax+1] = {0.0}; //energy
 // need to be parallelized
 for (i = 0; i < np-1; i++){
 for (j = i+1; j < np; j++){
 FUNCTION_NAME(i, j, np, E[]);
 }
 }
}

Question 10

@pacmainbw Thank you very much for your suggestions. Based on your comments, I will optimize the code.

Question 11

So you need to parallelize it and wonder where you should write

#pragma omp parallel for

There are many permutations of this that might be more effective, see manual for more, but try to place it just before the first loop.

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges144 bronze badges · Accepted Answer · 2020-10-15 08:29:50Z

In parallel programming, it's often useful to try to minimize the degree to which the threads work with the same data. In this case, you have all the threads doing updates to E almost constantly throughout execution.

To minimize collisions, I'd allocate a separate instance of E for each iteration of the loop you decide to execute in parallel, then after you've done the main loops of the simulation, add an extra loop to sum up the values in the per-iteration storage for E.

For example:

double E[kmax+1];
static double Es[np+1][kmax+1];
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < np - 1; i++) {
 for (int j = i + 1; j < np; j++) {
 double x = sin(double(2 * i + j) / (3.0 * np)); //x,y,z,d1 for test.
 double y = sin(double(i + j) / (2.0 * np)); //x,y,z are velocity
 double z = sin(double(i * j) / np / np); //d1 velocity magnitude
 double d1 = sqrt(x * x + y * y + z * z);
 for (int k = 1; k <= kmax; k++) {
 double d2 = k * d1;
 // E[k] += x + y + sin(d2) / d2;
 Es[i][k] += x + y + sin(d2) / d2;
 }
 }
}
for (int i = 0; i < np; i++)
 for (int j = 0; j < kmax; j++)
 E[j] += Es[i][j];

To run a quick test, I reduced np to 10'000. For that number of particles, your code (with the #pragma omp line added) ran in about 14.7 seconds. Changing it as outlined above reduced that to about 4.1 seconds.

vector

Rather than allocating arrays, I'd use std::vector to store the data:

std::vector<double> E(kmax + 1);
std::vector<std::vector<double>> Es(np + 1, E);

Oh, it is a very nice solution. I will apply this approach to other codes. Thank you very much.

Stack Exchange Network

Calculation of Energy Spectrum using C++

3 Answers 3

vector

Avoid `using namespace std;`

Include Files

Declare the Variables as Needed

Break the Code Into Functions to Make it Easier to Implement and Debug

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculation of Energy Spectrum using C++

3 Answers 3

vector

Avoid using namespace std;

Include Files

Declare the Variables as Needed

Break the Code Into Functions to Make it Easier to Implement and Debug

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Avoid `using namespace std;`