Numpy array indexing and/or addition seems slow

Question 1

I was playing around with benchmarking numpy arrays because I was getting slower than expected results when I tried to replace python arrays with numpy arrays in a script.

I know I'm missing something, and I was hoping someone could clear up my ignorance.

I created two functions and timed them

NUM_ITERATIONS = 1000
def np_array_addition():
 np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] += x
 np_array[1] += x
def py_array_addition():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] += x
 py_array[1] += x

Results:

np_array_addition: 2.556 seconds
py_array_addition: 0.204 seconds

What gives? What's causing the massive slowdown? I figured that if I was using statically sized arrays numpy would be at least the same speed.

Thanks!

Update:

It kept bothering me that numpy array access was slow, and I figured "Hey, they're just arrays in memory right? Cython should solve this!"

And it did. Here's my revised benchmark

import numpy as np
cimport numpy as np 
ctypedef np.int_t DTYPE_t 
NUM_ITERATIONS = 200000
def np_array_assignment():
 cdef np.ndarray[DTYPE_t, ndim=1] np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] += 1
 np_array[1] += 1 
def py_array_assignment():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] += 1
 py_array[1] += 1

I redefined the np_array to cdef np.ndarray[DTYPE_t, ndim=1]

print(timeit(py_array_assignment, number=3))
# 0.03459
print(timeit(np_array_assignment, number=3))
# 0.00755

That's with the python function also being optimized by cython. The timing for the python function in pure python is

print(timeit(py_array_assignment, number=3))
# 0.12510

A 17x speedup. Sure it's a silly example, but I thought it was educational.

Question 2

I'm guessing here, but I'd say it's the extra work from changing context between C and Python back and forth what is killing the performance here.

Question 3

This is not (just) addition which is slow, it is element access overhead, see for example:

def np_array_assignment():
 np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] = 1
 np_array[1] = 1
def py_array_assignment():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] = 1
 py_array[1] = 1
timeit np_array_assignment()
10000 loops, best of 3: 178 us per loop
timeit py_array_assignment()
10000 loops, best of 3: 72.5 us per loop

Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.

Use numpy functions to avoid looping, making operations on the whole array at once, i.e.:

def np_array_addition_good():
 np_array = np.array([1, 2])
 np_array += np.sum(np.arange(NUM_ITERATIONS))

The results comparing your functions with the one above are pretty revealing:

timeit np_array_addition()
1000 loops, best of 3: 1.32 ms per loop
timeit py_array_addition()
10000 loops, best of 3: 101 us per loop
timeit np_array_addition_good()
100000 loops, best of 3: 11 us per loop

But actually, you can do as good with pure python if you collapse the loops:

def py_array_addition_good():
 py_array = [1, 2]
 rangesum = sum(range(NUM_ITERATIONS))
 py_array = [x + rangesum for x in py_array]
timeit py_array_addition_good()
100000 loops, best of 3: 11 us per loop

All in all, with such simple operations there is really no improvement in using numpy. Optimized code in pure python works just as good.

There were a lot of questions about it and I suggest looking at some good answers there:

How do I maximize efficiency with numpy arrays?

numpy float: 10x slower than builtin in arithmetic operations?

Question 4

You're not actually using numpy's vectorized array addition if you do the loop in python; there's also the access overhead mentioned by @shashkello.

I took the liberty of increasing the array size a tad, and also adding a vectorized version of the addition:

import numpy as np
from timeit import timeit
NUM_ITERATIONS = 1000
def np_array_addition():
 np_array = np.array(xrange(1000))
 for x in xrange(NUM_ITERATIONS):
 for i in xrange(len(np_array)):
 np_array[i] += x
def np_array_addition2():
 np_array = np.array(xrange(1000))
 for x in xrange(NUM_ITERATIONS):
 np_array += x
def py_array_addition():
 py_array = range(1000)
 for x in xrange(NUM_ITERATIONS):
 for i in xrange(len(py_array)):
 py_array[i] += x
print timeit(np_array_addition, number=3) # 4.216162
print timeit(np_array_addition2, number=3) # 0.117681
print timeit(py_array_addition, number=3) # 0.439957

As you can see, the vectorized numpy version wins pretty handily. The gap will just get larger as array sizes and/or iterations increase.

Question 5

Ah, that makes sense. So the vector addition shines through more with larger arrays

Question 6

You should remove the construction from the array loops, or use the numpy construction functions...

sashkello sashkello 18k25 gold badges84 silver badges112 bronze badges · Accepted Answer · 2014-03-07 00:47:01Z

This is not (just) addition which is slow, it is element access overhead, see for example:

def np_array_assignment():
 np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] = 1
 np_array[1] = 1
def py_array_assignment():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] = 1
 py_array[1] = 1
timeit np_array_assignment()
10000 loops, best of 3: 178 us per loop
timeit py_array_assignment()
10000 loops, best of 3: 72.5 us per loop

Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.

Use numpy functions to avoid looping, making operations on the whole array at once, i.e.:

def np_array_addition_good():
 np_array = np.array([1, 2])
 np_array += np.sum(np.arange(NUM_ITERATIONS))

The results comparing your functions with the one above are pretty revealing:

timeit np_array_addition()
1000 loops, best of 3: 1.32 ms per loop
timeit py_array_addition()
10000 loops, best of 3: 101 us per loop
timeit np_array_addition_good()
100000 loops, best of 3: 11 us per loop

But actually, you can do as good with pure python if you collapse the loops:

def py_array_addition_good():
 py_array = [1, 2]
 rangesum = sum(range(NUM_ITERATIONS))
 py_array = [x + rangesum for x in py_array]
timeit py_array_addition_good()
100000 loops, best of 3: 11 us per loop

All in all, with such simple operations there is really no improvement in using numpy. Optimized code in pure python works just as good.

There were a lot of questions about it and I suggest looking at some good answers there:

How do I maximize efficiency with numpy arrays?

numpy float: 10x slower than builtin in arithmetic operations?

CollectivesTM on Stack Overflow

Numpy array indexing and/or addition seems slow

Update:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

Update:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related