I have a function that takes a four-dimensional NumPy array M and, for each value i of its second index, takes all of M without the i-th "column", evaluates the product over all other columns, and stacks the results so that the array returned has the same shape as M.
In other words, what I want to evaluate is:
$$R_{ijkl} = \prod_{j^l \ne j} M_{ij^{l}kl}$$
My main concern is performance: M's shape is usually something similar to (16,8,1000,255)
and this function call takes up the great majority of my program's execution time.
import numpy as np
def sliceAndMultiply(M):
# create masks
masks = [ range(i) + range(i+1, M.shape[1]) for i in range(M.shape[1]) ]
# evaluate products over masks and stack them
return np.stack([ np.prod(M[:,m,:,:], axis=1) for m in masks ], axis=1)
M = np.random.rand(16,8,1000,255)
R = sliceAndMultiply(M)
Another variant I tried is:
def sliceAndMultiply(M):
return np.stack([ np.prod(np.delete(M, j, axis=1), axis=1) \
for j in range(M.shape[1]) ], axis=1)
but the performance of these two functions is basically the same.
1 Answer 1
A one liner for your problem:
def slice_and_multiply(arr):
return np.product(arr, axis=1, keepdims=True) / M
This multiplies all of the "columns" first, then divides by each "column" to get the result of multiplying all but that column, which may be problematic depending on your data due to potential overflows or precision losses. And it breaks completely if there are any zeros in the input array.
If you can get rid of the division (this is a typical interview question, by the way), then all of those problems dissappear. And you can do it with relative ease, by cleverly storing accumulated products from both ends of the array. then multiplying them together:
def slice_and_multiply_bis(arr):
ret = np.ones_like(arr)
np.cumprod(arr[:, :-1], axis=1, out=ret[:, 1:])
ret[:, :-1] *= np.cumprod(arr[:, :0:-1], axis=1)[:, ::-1]
return ret
And of course:
>>> M = np.random.rand(16, 8, 1000, 255)
>>> np.allclose(slice_and_multiply(M), slice_and_multiply_bis(M))
True
That last code is a little too clever for its own good, so it should it should be heavily commented.
By my timings, for your target size, slice_and_multiply
is 5x faster than your implementation, and slice_and_multiply_bis
about 2-3x faster.
-
1\$\begingroup\$ for posterity: the three lines in
slice_and_multiply_bis
work by multiplying cumulative products ofarr
in both directions (from beginning to end of each row and from end to beginning) in a smart way. Ifarr = array([a,b,c])
, the contents ofret
would be, for each of the lines,ret == [1,1,1]
, thenret == [1, a, ab]
, and finallyret == [1, a, ab]*[cb, c, 1] == [cb, ac, ab]
\$\endgroup\$blue– blue2016年05月06日 10:47:32 +00:00Commented May 6, 2016 at 10:47