Compute lp-norm of a vector in parallel with MPI

Question 1

I'm trying to solve the following exercise:

Compute in parallel with MPI the lp norm of a vector of a predefined size N allocated on the heap. Your implementation must work also when the number of processes p does not divide the number of of elements N.

Here's my code. p is the number of ranks, while norm_p describes the mathematical norm I wish to compute.

I have a question about the allocation on the heap. Even if the array vec is communicated to every processes with a MPI_Bcast, then I do not need to call free(vec) for every rank, right?

Indeed, if I do so I have runtime errors of the type (pointer being freed was not allocated), but I'd like also to understand what is the best way to take care of this situation.

#include <math.h>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// Compute the lp-norm of a vector, p != infinity
#define N 12
int main(int argc, char *argv[]) {
 int rank, p, global_sum, norm_p; 
 int local_size, local_a, local_b;
 int local_sum = 0;
 int *vec;
 MPI_Init(&argc, &argv);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 MPI_Comm_size(MPI_COMM_WORLD, &p);
 if (rank == 0) {
 printf("Choose the norm you want: p=0,... \n");
 scanf("%d", &norm_p);
 vec = (int *)malloc(sizeof(int) * N);
 for (int i = 0; i < N; i++)
 vec[i] = i * i;
 }
 MPI_Bcast(vec, N, MPI_INT, 0, MPI_COMM_WORLD);
 MPI_Bcast(&norm_p, 1, MPI_INT, 0, MPI_COMM_WORLD);
 local_size = N / p;
 int rem = N % p;
 if (rank < rem) {
 local_a = rank * (local_size + 1);
 local_b = local_a + local_size;
 } else {
 local_a = rank * local_size + rem;
 local_b = local_a + (local_size - 1);
 }
 printf("On rank %d I'm computing the sum from %d to %d \n", rank, local_a,
 local_b);
 for (int i = local_a; i <= local_b; ++i) {
 local_sum += fabs(pow(vec[i], norm_p));
 }
 printf("On rank %d the computed local sum is %d \n", rank, local_sum);
 MPI_Reduce(&local_sum, &global_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
 if (rank == 0) {
 printf("On rank 0 I'm showing the norm of the vector: %lf \n",
 pow(global_sum, 1. / norm_p));
 free(vec);
 }
 MPI_Finalize();
 return 0;
}

Question 2

Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. If you want to present updated code for another review, then ask a follow up question with a link back to this question.

Question 3

Okay, will do. Sorry for that @pacmaninbw

Question 4

Your assignment is ambiguous. You can indeed create the vector on one process and broadcast it, but that is bad MPI practice for two reasons: it's a sequential bottleneck, and it means that process zero needs to have as much memory as all other processes combined. I have a cluster with more than 100 thousand cores. Do you think that's realistic?

More importantly, you are thinking in shared memory terms. You only allocate vec on process zero. MPI is dstributed memory: each process has its own memory. So each process needs to allocate its own instance of vec. There is no memory that they all share.

You have set N very small. Set it to something realistic, like at least a few million. You will then see that your code bombs because of the problem in the previous paragraph. Also: for small N the communication cost of MPI completely overwhelms the computation of the norm. Also also: if you insist on a small N, it would be fun to test your code if you have more processes than the value of N.

Question 5

Thanks a lot for reviewing this. I see what you mean and I realised that I misunderstood the assignment. Now I edited my question with the updated code, where each process is mallocating and freeing only a local_size portion of the array. Does it seem reasonable now? @VictorEijkhout

Question 6

I had to post the follow-up question here: codereview.stackexchange.com/questions/282484/… @VictorEijkhout

Question 7

Your description sounds like you're doing (what is in my book) the right thing.

score 4 · Accepted Answer · 2023-01-08 17:34:25Z

Your assignment is ambiguous. You can indeed create the vector on one process and broadcast it, but that is bad MPI practice for two reasons: it's a sequential bottleneck, and it means that process zero needs to have as much memory as all other processes combined. I have a cluster with more than 100 thousand cores. Do you think that's realistic?

More importantly, you are thinking in shared memory terms. You only allocate vec on process zero. MPI is dstributed memory: each process has its own memory. So each process needs to allocate its own instance of vec. There is no memory that they all share.

You have set N very small. Set it to something realistic, like at least a few million. You will then see that your code bombs because of the problem in the previous paragraph. Also: for small N the communication cost of MPI completely overwhelms the computation of the norm. Also also: if you insist on a small N, it would be fun to test your code if you have more processes than the value of N.

Thanks a lot for reviewing this. I see what you mean and I realised that I misunderstood the assignment. Now I edited my question with the updated code, where each process is mallocating and freeing only a local_size portion of the array. Does it seem reasonable now? @VictorEijkhout
I had to post the follow-up question here: codereview.stackexchange.com/questions/282484/… @VictorEijkhout
Your description sounds like you're doing (what is in my book) the right thing.

Stack Exchange Network

Compute lp-norm of a vector in parallel with MPI

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Compute lp-norm of a vector in parallel with MPI

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions