2
\$\begingroup\$

I want to extract the combinations of numpy arrays in this way:

from itertools import combinations
import numpy as np
X = np.array(np.random.randn(4, 6))
combination = np.array([X[:, [i, j]] for i, j in combinations(range(X.shape[1]), 2)])

Is there a way to speed this up? My array has a shape of 200x50000.

Caridorc
28k7 gold badges54 silver badges137 bronze badges
asked Mar 5, 2021 at 13:01
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Asymptotically the answer is no. If you generate all $k$-combinations of an $n$-element universe, then any algorithm will require at least $n \choose k$ steps because there's exactly as many elements in the output. If you want speed, your best bet might be to first consider whether you can avoid enumerating everything (i.e., is there a smarter algorithm for what you are actually doing), but if not, maybe you get small speedups that are faster than combinations. \$\endgroup\$ Commented Mar 6, 2021 at 9:08

1 Answer 1

2
\$\begingroup\$

I decided to use an example of shape 20x5000. For which I could reduce the time from ~58 sec to ~3 sec i.e. almost a factor of 20.

def combinations(arr):
 n = arr.shape[0]
 a = np.broadcast_to(arr, (n, n))
 b = np.broadcast_to(arr.reshape(-1,1), (n, n))
 upper = np.tri(n, n, -1, dtype='bool').T
 return b[upper].reshape(-1), a[upper].reshape(-1)
X = np.array(np.random.randn(20,5000))
%timeit X[:, [*combinations(np.arange(X.shape[1]))]]
%timeit np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])

is giving me

3.2 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
57.8 s ± 2.35 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

And

combination = np.array([X[:, [i, j]] for i, j in itertools.combinations(range(X.shape[1]), 2)])
np.allclose(combination, np.moveaxis(X[:, [*combinations(np.arange(X.shape[1]))]], -1, 0))

confirms I am calculating the right thing. Just the axes are ordered differently.

answered Jul 21, 2021 at 17:39
\$\endgroup\$
2
  • \$\begingroup\$ Can you explain why your code is better than the original code? \$\endgroup\$ Commented Jul 21, 2021 at 22:26
  • 2
    \$\begingroup\$ @pacmaninbw Sure. It's the standard numpy approach. You work with whole arrays at once rather than looping over individual entries. So rather than having a slow python interpreter doing the work you can hand it all to highly optimized/pre-compiled c or fortran code written by the numpy community. \$\endgroup\$ Commented Jul 21, 2021 at 22:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.