I tried implementing matrix multiplication with parallel for loop in OpenMP as follows. It runs correctly but I want to make sure if I'm missing anything. How does this determine the number of threads to run?
Matrix
is a class for square matrices.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
# include <omp.h>
Matrix parallel_mat_mul(Matrix a, Matrix b)
{
int n =a.getSize();
Matrix c(n);
clock_t begin_time = clock();
# pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
{
# pragma omp for
for ( int i = 0; i < n; i++ )
{
for (int j = 0; j < n; j++ )
{
double local_sum=0;
for ( int k = 0; k < n; k++ )
{
local_sum+= (a(i,k)*b(k,j));
}
c(i,j)=local_sum;
}
}
}
cout << "Parallel time: "<<float( clock () - begin_time ) / CLOCKS_PER_SEC <<"\n";
return c;
}
3 Answers 3
There is a real problem with your code in that you pass the matrices by copy.
Matrix parallel_mat_mul(Matrix a, Matrix b)
This should really be either passed by reference
Matrix parallel_mat_mul(const Matrix& a, const Matrix& b)
Or implemented through an operator of the Matrix
class
operator+(const Matrix& other) const
Nevertheless to better judge this function one would need the implementation of Matrix
Assuming you store matrices in row-major order, this trashes cache memory:
for ( int i = 0; i < n; i++ )
{
for (int j = 0; j < n; j++ )
{
double local_sum=0;
for ( int k = 0; k < n; k++ )
{
local_sum+= (a(i,k)*b(k,j));
}
c(i,j)=local_sum;
}
}
Rewrite it as follows and you will see a big performance improvement:
// Clear matrix c here.
...
for ( int i = 0; i < n; i++)
{
for ( int k = 0; k < n; k++)
{
for (int j = 0; j < n; j++)
{
c(i,j) += (a(i,k)*b(k,j));
}
}
}
You can write
# pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
{
# pragma omp for
for ( int i = 0; i < n; i++ )
{
as
# pragma omp parallel for shared ( a, b, c, n ) // private ( i, j, k )
for ( int i = 0; i < n; i++ )
{
This saves one level of braces and indentation. It is a convenience syntax for the case where one loop spans the full parallel section.
I would suggest you take care to be consistent with spaces around operators and braces. It makes the code more readable. The disorganized look caused by inconsistent spacing can distract the reader from the code logic.
Prefer ++i
over i++
. For an integer there is no difference, but in case of an iterator or other more complex object, i++
typically causes a copy of the object to be made.
Matrix
somewhere? \$\endgroup\$