Fast brute force numpy array combination

Question 1

I want to extract the combinations of numpy arrays in this way:

from itertools import combinations
import numpy as np
X = np.array(np.random.randn(4, 6))
combination = np.array([X[:, [i, j]] for i, j in combinations(range(X.shape[1]), 2)])

Is there a way to speed this up? My array has a shape of 200x50000.

Question 2

Asymptotically the answer is no. If you generate all $k$-combinations of an $n$-element universe, then any algorithm will require at least $n \choose k$ steps because there's exactly as many elements in the output. If you want speed, your best bet might be to first consider whether you can avoid enumerating everything (i.e., is there a smarter algorithm for what you are actually doing), but if not, maybe you get small speedups that are faster than combinations.

Question 3

I decided to use an example of shape 20x5000. For which I could reduce the time from ~58 sec to ~3 sec i.e. almost a factor of 20.

def combinations(arr):
 n = arr.shape[0]
 a = np.broadcast_to(arr, (n, n))
 b = np.broadcast_to(arr.reshape(-1,1), (n, n))
 upper = np.tri(n, n, -1, dtype='bool').T
 return b[upper].reshape(-1), a[upper].reshape(-1)
X = np.array(np.random.randn(20,5000))
%timeit X[:, [*combinations(np.arange(X.shape[1]))]]
%timeit np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])

is giving me

3.2 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
57.8 s ± 2.35 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

And

combination = np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])
np.allclose(combination, np.moveaxis(X[:, [*combinations(np.arange(X.shape[1]))]], -1, 0))

confirms I am calculating the right thing. Just the axes are ordered differently.

Question 4

Can you explain why your code is better than the original code?

Question 5

@pacmaninbw Sure. It's the standard numpy approach. You work with whole arrays at once rather than looping over individual entries. So rather than having a slow python interpreter doing the work you can hand it all to highly optimized/pre-compiled c or fortran code written by the numpy community.

Lukas S Lukas S 1212 bronze badges · Answer 1 · 2021-07-21 17:39:44Z

I decided to use an example of shape 20x5000. For which I could reduce the time from ~58 sec to ~3 sec i.e. almost a factor of 20.

def combinations(arr):
 n = arr.shape[0]
 a = np.broadcast_to(arr, (n, n))
 b = np.broadcast_to(arr.reshape(-1,1), (n, n))
 upper = np.tri(n, n, -1, dtype='bool').T
 return b[upper].reshape(-1), a[upper].reshape(-1)
X = np.array(np.random.randn(20,5000))
%timeit X[:, [*combinations(np.arange(X.shape[1]))]]
%timeit np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])

is giving me

3.2 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
57.8 s ± 2.35 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

And

combination = np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])
np.allclose(combination, np.moveaxis(X[:, [*combinations(np.arange(X.shape[1]))]], -1, 0))

confirms I am calculating the right thing. Just the axes are ordered differently.

Can you explain why your code is better than the original code?
@pacmaninbw Sure. It's the standard numpy approach. You work with whole arrays at once rather than looping over individual entries. So rather than having a slow python interpreter doing the work you can hand it all to highly optimized/pre-compiled c or fortran code written by the numpy community.

Stack Exchange Network

Fast brute force numpy array combination

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fast brute force numpy array combination

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions