4
\$\begingroup\$

I tried implementing matrix multiplication with parallel for loop in OpenMP as follows. It runs correctly but I want to make sure if I'm missing anything. How does this determine the number of threads to run?

Matrix is a class for square matrices.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
# include <omp.h>
Matrix parallel_mat_mul(Matrix a, Matrix b)
{
 int n =a.getSize();
 Matrix c(n); 
 clock_t begin_time = clock();
 # pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
 {
 # pragma omp for
 for ( int i = 0; i < n; i++ )
 {
 for (int j = 0; j < n; j++ )
 {
 double local_sum=0;
 for ( int k = 0; k < n; k++ )
 {
 local_sum+= (a(i,k)*b(k,j));
 }
 c(i,j)=local_sum;
 }
 }
 }
 cout << "Parallel time: "<<float( clock () - begin_time ) / CLOCKS_PER_SEC <<"\n";
 return c;
}
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jun 16, 2017 at 7:06
\$\endgroup\$
2
  • \$\begingroup\$ The default number of threads is the number of cores in your machine! \$\endgroup\$ Commented May 28, 2019 at 13:23
  • \$\begingroup\$ Are you missing a definition of Matrix somewhere? \$\endgroup\$ Commented Aug 29, 2022 at 8:32

3 Answers 3

4
\$\begingroup\$

There is a real problem with your code in that you pass the matrices by copy.

Matrix parallel_mat_mul(Matrix a, Matrix b)

This should really be either passed by reference

Matrix parallel_mat_mul(const Matrix& a, const Matrix& b)

Or implemented through an operator of the Matrix class

operator+(const Matrix& other) const

Nevertheless to better judge this function one would need the implementation of Matrix

answered Jun 16, 2017 at 11:11
\$\endgroup\$
4
\$\begingroup\$

Assuming you store matrices in row-major order, this trashes cache memory:

for ( int i = 0; i < n; i++ )
{
 for (int j = 0; j < n; j++ )
 {
 double local_sum=0;
 for ( int k = 0; k < n; k++ )
 {
 local_sum+= (a(i,k)*b(k,j));
 }
 c(i,j)=local_sum;
 }
}

Rewrite it as follows and you will see a big performance improvement:

// Clear matrix c here.
...
for ( int i = 0; i < n; i++)
{
 for ( int k = 0; k < n; k++)
 {
 for (int j = 0; j < n; j++)
 {
 c(i,j) += (a(i,k)*b(k,j));
 }
 }
}
answered Aug 27, 2022 at 22:59
\$\endgroup\$
1
\$\begingroup\$

You can write

 # pragma omp parallel shared ( a, b, c, n ) // private ( i, j, k )
 {
 # pragma omp for
 for ( int i = 0; i < n; i++ )
 {

as

 # pragma omp parallel for shared ( a, b, c, n ) // private ( i, j, k )
 for ( int i = 0; i < n; i++ )
 {

This saves one level of braces and indentation. It is a convenience syntax for the case where one loop spans the full parallel section.


I would suggest you take care to be consistent with spaces around operators and braces. It makes the code more readable. The disorganized look caused by inconsistent spacing can distract the reader from the code logic.


Prefer ++i over i++. For an integer there is no difference, but in case of an iterator or other more complex object, i++ typically causes a copy of the object to be made.

answered Aug 28, 2022 at 1:40
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.