1
$\begingroup$

I have come across what seems to be a collapsed loop.

parallel-for i, j = 1...n

What would be the work and span/depth/critical path length be of loops like this? Would it be the same as that of the following:

parallel-for i = 1...n
 parallel-for j = 1...n

EDIT:

Work = the asymptotic running time of the algorithm on a single core (total computational steps)

Span/depth/critical path length = the asymptotic running time of the algorithm on an infinite number of cores (the point beyond which more parallel processors won't give any performance gains).

asked Jun 15, 2023 at 1:34
$\endgroup$
4
  • $\begingroup$ Without knowing what language/formalism this is, my guess is that it's emphasising that every execution of the body is independent. Can you give a little more detail about where this comes from? $\endgroup$ Commented Jun 15, 2023 at 1:46
  • $\begingroup$ It's a step in a matrix algorithm parallel-for i, j, k = 1 to n do C[i, j, k] = A[i, k] * B[k, j] $\endgroup$ Commented Jun 15, 2023 at 2:26
  • $\begingroup$ Please define the terms work, span, depth, critical path length in the frame of your question. $\endgroup$ Commented Jun 15, 2023 at 16:37
  • $\begingroup$ Added. I took those as standard terms because I've seen them used without definition in parallel programming resources. $\endgroup$ Commented Jun 17, 2023 at 1:58

1 Answer 1

1
$\begingroup$

The work $W(n)$ is the total number of nodes in your computation graph and the span $D(n)$ is the number of nodes on the longest path of that graph. $T_P(n)$ is the runtime of the algorithm using $P$ processors; $T_1(n) = W(n)$ and $T_\inf(n) = D(n)$.

Suppose your code has the following shape

parfor i in range(n):
 parfor j in range(n):
 f()

where $f() \in O(1)$ is some function (with side-effects). The keyword "parfor" means that every loop iteration is independent. There are $n^2$ independent iterations so in theory $W(n)\in O(n^2)$ and $T_\inf(n) = D(n)\in O(1)$.

However, the work of starting each $f()$ thread and waiting for its result must be accounted for. That is, we must implement the "parfor" keyword. Here is one way:

parfor_impl(f, lo, hi):
 if hi - lo <= 1:
 f()
 else:
 mid = (lo + hi) // 2
 spawn parfor_impl(f, lo, mid)
 spawn parfor_impl(f, mid, hi)
 wait

where the keywords "spawn" and "wait" launches threads and waits for previously launched threads' completion respectively. This is a classical recursive scheme with $W_{parfor} \in O(n)$ and $D_{parfor} \in O(\log n)$. To get the total runtime (and span) we just add the runtime of the two nested loops $O(2\log n) = O(\log n)$. We do not multiply the runtime since the loops run in parallel.

I can highly recommend the "Intro to the Work-Span Model" chapter of Udacity's High Performance Computing course which delves deep into this topic.

answered Nov 12, 2023 at 18:48
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.