Numpy: Find the values of rows given an array of indexes

Question 1

I have a 2D array of values and a 1D array of indexes. I want to pull the values from the index of each row using an array of indexes. The following code would do this successfully:

from pprint import pprint
import numpy as np
_2Darray = np.arange(100, dtype = np.float16)
_2Darray = _2Darray.reshape((10, 10))
array_indexes = [5,5,5,4,4,4,6,6,6,8]
index_values = []
for row, index in enumerate(array_indexes):
 index_values.append(_2Darray[row, index])
pprint(_2Darray)
print index_values

Returns

array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
 [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
 [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
 [ 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
 [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
 [ 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
 [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
 [ 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
 [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
 [ 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]], dtype=float16)
[5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

But I want to do it using only numpy functions. I have tried a whole bunch of numpy functions, but none of them seem to do this fairly simply task.

Thanks in advance!

Edit I managed to figure out what my implementation would be: V_high = np.fromiter((

index_values = _2Darray[ind[0], ind[1]] for ind in
 enumerate(array_indexes)),
 dtype = _2Darray.dtype,
 count = len(_2Darray))

Thanks to root I've got both implementations worked out. Now for some profiling: My implementation run through cProfiler

 ncalls tottime percall cumtime percall filename:lineno(function)
 2 0.274 0.137 0.622 0.311 {numpy.core.multiarray.fromiter}
20274 0.259 0.000 0.259 0.000 lazer_np.py:86(<genexpr>)

And root's:

 4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
 1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.arange}

I can't believe it, but the cProfiler is not detecting root's method to take any time at all. I think this must be some kind of bug, but it is definitely noticeably faster. On an earlier test I got root's to be about 3 times faster

Note: these tests were done on a shape = (20273, 200) array of np.float16 values. Additionally, each indexing had to be run twice for each test.

Question 2

This should do it:

row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]

Numpy allows you to index 2d arrays (or nd arrays really) with two arrays such that:

for i in range(len(row)):
 result1[i] = array[row[i], col[i]]
result2 = array[row, col]
numpy.all(result1 == result2)

Question 3

In [15]: _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
Out[15]: array([ 5., 15., 25., 34., 44., 54., 66., 76., 86., 98.],
 dtype=float16)

BUT, I think something based on you solution may actually be the faster on smaller arrays. If the arrays are bigger than 100*100 use numpy indexing.

In [22]: def f(array, indices):
 ...: return [array[row, index] for row, index in enumerate(indices)]
In [23]: f(_2Darray, [5,5,5,4,4,4,6,6,6,8])
Out[23]: [5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

In [27]: %timeit f(_2Darray,[5,5,5,4,4,4,6,6,6,8])
100000 loops, best of 3: 7.48 us per loop
In [28]: %timeit _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
10000 loops, best of 3: 24.2 us per loop

Question 4

These timings seem premature and without knowing more specifics about the OPs use-case they are not very useful. In general, using numpy's indexing and passing arrays (note you used lists above) is going to give the best performance.

Question 5

@BiRico -- I did the timings again, and it seems that the list comprehension is faster with arrays smaller than 100*100...changed the wording...also note that len(array) is faster than array.shape[0]

Question 6

You have to pay attention to use specifically numpy functions designed for arrays, not for matrixes. The two are easy to confuse and do not raise error when methods of one are called on the other, yet the output is pretty much unpredicatable.

Bi Rico Bi Rico 25.9k3 gold badges56 silver badges75 bronze badges · Accepted Answer · 2013-03-08 05:24:43Z

This should do it:

row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]

Numpy allows you to index 2d arrays (or nd arrays really) with two arrays such that:

for i in range(len(row)):
 result1[i] = array[row[i], col[i]]
result2 = array[row, col]
numpy.all(result1 == result2)

CollectivesTM on Stack Overflow

Numpy: Find the values of rows given an array of indexes

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related