Compared to "vectors addition" which can naturally work independently for the summation of each of its corresponding elements, it allows parallel work by processors. Example:
[1 2 3] + [4 5 6] = [5 7 9]
Which only requires 1 iteration.
How to make the "cumulative sum" operation algorithm work in parallel on a vector? Consider the following sequential cumulative sum problem: Give vector
a_vector := [1 2 3]
empty_vector := []
Step 1 -> [1]
Step 2 -> [1 3]
Step 3 -> [1 3 6]
Those steps filling up the empty vector. So it takes 3 iterations longer.
Though both produce the same number of elements as the number of elements in its operand, so it can be said that both are elementwise operations.
1 Answer 1
Use a parallel prefix sum algorithm. See Wikipedia for some examples.