To avoid the XY problem, here is an example of what I need. Given the sorted integer input array
[1 1 1 2 2 3]
, I would like to produce the following slices, together with their index:
0: slice(0,3) 1: slice(3,5) 2: slice(5,6)
(Both the index and the slice will be used to index/slice into some other arrays.)
I think this could be solved with the itertools.groupby(). Partly because I am learning NumPy, and partly because evidence suggests that groupby()
may be inefficient, I would like to do this in NumPy. The code below seems to do what I want.
from __future__ import print_function
import numpy as np
if __name__=='__main__':
arr = np.array([1, 1, 1, 2, 2, 3])
print(arr)
mask = np.ediff1d(arr, to_begin=1, to_end=1)
indices = np.where(mask)[0]
for i in np.arange(indices.size-1):
print('%d: slice(%d,%d)' % (i, indices[i], indices[i+1]))
Is there a cleaner / more efficient way of doing this?
1 Answer 1
The function scipy.ndimage.find_objects
returns exactly the slices that you are looking for:
>>> a = np.array([1, 1, 1, 2, 2, 3])
>>> scipy.ndimage.find_objects(a)
[(slice(0, 3, None),), (slice(3, 5, None),), (slice(5, 6, None),)]
It's hard to make a fair timing comparison without knowing exactly how you are going to use these slices, but a quick test shows that this is about twice as fast as your strategy of using np.ediff2d
followed by np.where
.
A couple of general points on your code:
You use
numpy.where
with a single argument. This is the same asnumpy.nonzero
and it would be clearer in this case to use the latter.But having made that change, you could use
numpy.flatnonzero
and so avoid the[0]
.It's pointless to create an array if you are only going to iterate over it. Better to use an iterator. So instead of:
for i in np.arange(indices.size-1): print('%d: slice(%d,%d)' % (i, indices[i], indices[i+1]))
you can write something like:
for i, j in enumerate(zip(indices, indices[1:])): print('{}: {}'.format(i, slice(*j)))
In response to comment: If you find the
zip
idiom unclear then you could encapsulate it:from future_builtins import zip # Python 2 only def pairs(a): """Return an iterator over pairs of adjacent elements in a. >>> list(pairs([1, 2, 3, 4])) [(1, 2), (2, 3), (3, 4)] """ return zip(a[:-1], a[1:])
and then the loop would become:
for i, j in enumerate(pairs(indices)): print('{}: {}'.format(i, slice(*j)))
which ought to be clear.
A fancier alternative is to use the (undocumented)
numpy.lib.stride_tricks.as_strided
to make an array of pairs of adjacent elements without allocating a new array:from numpy.lib.stride_tricks import as_strided def pairs(a): """Return array of pairs of adjacent elements in a. >>> pairs([1, 2, 3, 4]) array([[1, 2], [2, 3], [3, 4]]) """ a = np.asarray(a) return as_strided(a, shape=(a.size - 1, 2), strides=a.strides * 2)
The documented way to make such an array is
np.c_[indices[:-1], indices[1:]]
but that allocates a new array.