0

I have a 2D array of values and a 1D array of indexes. I want to pull the values from the index of each row using an array of indexes. The following code would do this successfully:

from pprint import pprint
import numpy as np
_2Darray = np.arange(100, dtype = np.float16)
_2Darray = _2Darray.reshape((10, 10))
array_indexes = [5,5,5,4,4,4,6,6,6,8]
index_values = []
for row, index in enumerate(array_indexes):
 index_values.append(_2Darray[row, index])
pprint(_2Darray)
print index_values

Returns

array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
 [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
 [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
 [ 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
 [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
 [ 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
 [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
 [ 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
 [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
 [ 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]], dtype=float16)
[5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

But I want to do it using only numpy functions. I have tried a whole bunch of numpy functions, but none of them seem to do this fairly simply task.

Thanks in advance!


Edit I managed to figure out what my implementation would be: V_high = np.fromiter((

index_values = _2Darray[ind[0], ind[1]] for ind in
 enumerate(array_indexes)),
 dtype = _2Darray.dtype,
 count = len(_2Darray))

Thanks to root I've got both implementations worked out. Now for some profiling: My implementation run through cProfiler

 ncalls tottime percall cumtime percall filename:lineno(function)
 2 0.274 0.137 0.622 0.311 {numpy.core.multiarray.fromiter}
20274 0.259 0.000 0.259 0.000 lazer_np.py:86(<genexpr>)

And root's:

 4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
 1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.arange}

I can't believe it, but the cProfiler is not detecting root's method to take any time at all. I think this must be some kind of bug, but it is definitely noticeably faster. On an earlier test I got root's to be about 3 times faster

Note: these tests were done on a shape = (20273, 200) array of np.float16 values. Additionally, each indexing had to be run twice for each test.

asked Mar 8, 2013 at 5:04

3 Answers 3

5

This should do it:

row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]

Numpy allows you to index 2d arrays (or nd arrays really) with two arrays such that:

for i in range(len(row)):
 result1[i] = array[row[i], col[i]]
result2 = array[row, col]
numpy.all(result1 == result2)
answered Mar 8, 2013 at 5:24
Sign up to request clarification or add additional context in comments.

Comments

3
In [15]: _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
Out[15]: array([ 5., 15., 25., 34., 44., 54., 66., 76., 86., 98.],
 dtype=float16)

BUT, I think something based on you solution may actually be the faster on smaller arrays. If the arrays are bigger than 100*100 use numpy indexing.

In [22]: def f(array, indices):
 ...: return [array[row, index] for row, index in enumerate(indices)]
In [23]: f(_2Darray, [5,5,5,4,4,4,6,6,6,8])
Out[23]: [5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

In [27]: %timeit f(_2Darray,[5,5,5,4,4,4,6,6,6,8])
100000 loops, best of 3: 7.48 us per loop
In [28]: %timeit _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
10000 loops, best of 3: 24.2 us per loop
answered Mar 8, 2013 at 5:26

2 Comments

These timings seem premature and without knowing more specifics about the OPs use-case they are not very useful. In general, using numpy's indexing and passing arrays (note you used lists above) is going to give the best performance.
@BiRico -- I did the timings again, and it seems that the list comprehension is faster with arrays smaller than 100*100...changed the wording...also note that len(array) is faster than array.shape[0]
0

You have to pay attention to use specifically numpy functions designed for arrays, not for matrixes. The two are easy to confuse and do not raise error when methods of one are called on the other, yet the output is pretty much unpredicatable.

answered Mar 8, 2013 at 5:12

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.