Matrix multiplication with OpenMP parallel for loop

Question 1

I tried implementing matrix multiplication with parallel for loop in OpenMP as follows. It runs correctly but I want to make sure if I'm missing anything. How does this determine the number of threads to run?

Matrix is a class for square matrices.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
# include <omp.h>
Matrix parallel_mat_mul(Matrix a, Matrix b)
{
 int n =a.getSize();
 Matrix c(n); 
 clock_t begin_time = clock();
 # pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
 {
 # pragma omp for
 for ( int i = 0; i < n; i++ )
 {
 for (int j = 0; j < n; j++ )
 {
 double local_sum=0;
 for ( int k = 0; k < n; k++ )
 {
 local_sum+= (a(i,k)*b(k,j));
 }
 c(i,j)=local_sum;
 }
 }
 }
 cout << "Parallel time: "<<float( clock () - begin_time ) / CLOCKS_PER_SEC <<"\n";
 return c;
}

Question 2

The default number of threads is the number of cores in your machine!

Question 3

Are you missing a definition of Matrix somewhere?

Question 4

There is a real problem with your code in that you pass the matrices by copy.

Matrix parallel_mat_mul(Matrix a, Matrix b)

This should really be either passed by reference

Matrix parallel_mat_mul(const Matrix& a, const Matrix& b)

Or implemented through an operator of the Matrix class

operator+(const Matrix& other) const

Nevertheless to better judge this function one would need the implementation of Matrix

Question 5

Assuming you store matrices in row-major order, this trashes cache memory:

for ( int i = 0; i < n; i++ )
{
 for (int j = 0; j < n; j++ )
 {
 double local_sum=0;
 for ( int k = 0; k < n; k++ )
 {
 local_sum+= (a(i,k)*b(k,j));
 }
 c(i,j)=local_sum;
 }
}

Rewrite it as follows and you will see a big performance improvement:

// Clear matrix c here.
...
for ( int i = 0; i < n; i++)
{
 for ( int k = 0; k < n; k++)
 {
 for (int j = 0; j < n; j++)
 {
 c(i,j) += (a(i,k)*b(k,j));
 }
 }
}

Question 6

You can write

 # pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
 {
 # pragma omp for
 for ( int i = 0; i < n; i++ )
 {

as

 # pragma omp parallel for shared ( a, b, c, n ) // private ( i, j, k )
 for ( int i = 0; i < n; i++ )
 {

This saves one level of braces and indentation. It is a convenience syntax for the case where one loop spans the full parallel section.

I would suggest you take care to be consistent with spaces around operators and braces. It makes the code more readable. The disorganized look caused by inconsistent spacing can distract the reader from the code logic.

Prefer ++i over i++. For an integer there is no difference, but in case of an iterator or other more complex object, i++ typically causes a copy of the object to be made.

miscco miscco 4,35112 silver badges17 bronze badges · Answer 1 · 2017-06-16 11:11:30Z

There is a real problem with your code in that you pass the matrices by copy.

Matrix parallel_mat_mul(Matrix a, Matrix b)

This should really be either passed by reference

Matrix parallel_mat_mul(const Matrix& a, const Matrix& b)

Or implemented through an operator of the Matrix class

operator+(const Matrix& other) const

Nevertheless to better judge this function one would need the implementation of Matrix

score 4 · Answer 2 · 2022-08-27 22:59:32Z

Assuming you store matrices in row-major order, this trashes cache memory:

for ( int i = 0; i < n; i++ )
{
 for (int j = 0; j < n; j++ )
 {
 double local_sum=0;
 for ( int k = 0; k < n; k++ )
 {
 local_sum+= (a(i,k)*b(k,j));
 }
 c(i,j)=local_sum;
 }
}

Rewrite it as follows and you will see a big performance improvement:

// Clear matrix c here.
...
for ( int i = 0; i < n; i++)
{
 for ( int k = 0; k < n; k++)
 {
 for (int j = 0; j < n; j++)
 {
 c(i,j) += (a(i,k)*b(k,j));
 }
 }
}

Cris Luengo Cris Luengo 6,9911 gold badge14 silver badges37 bronze badges · Answer 3 · 2022-08-28 01:40:44Z

You can write

 # pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
 {
 # pragma omp for
 for ( int i = 0; i < n; i++ )
 {

as

 # pragma omp parallel for shared ( a, b, c, n ) // private ( i, j, k )
 for ( int i = 0; i < n; i++ )
 {

This saves one level of braces and indentation. It is a convenience syntax for the case where one loop spans the full parallel section.

I would suggest you take care to be consistent with spaces around operators and braces. It makes the code more readable. The disorganized look caused by inconsistent spacing can distract the reader from the code logic.

Prefer ++i over i++. For an integer there is no difference, but in case of an iterator or other more complex object, i++ typically causes a copy of the object to be made.

Stack Exchange Network

Matrix multiplication with OpenMP parallel for loop

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Matrix multiplication with OpenMP parallel for loop

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions