3

I was playing around with benchmarking numpy arrays because I was getting slower than expected results when I tried to replace python arrays with numpy arrays in a script.

I know I'm missing something, and I was hoping someone could clear up my ignorance.

I created two functions and timed them

NUM_ITERATIONS = 1000
def np_array_addition():
 np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] += x
 np_array[1] += x
def py_array_addition():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] += x
 py_array[1] += x

Results:

np_array_addition: 2.556 seconds
py_array_addition: 0.204 seconds

What gives? What's causing the massive slowdown? I figured that if I was using statically sized arrays numpy would be at least the same speed.

Thanks!

Update:

It kept bothering me that numpy array access was slow, and I figured "Hey, they're just arrays in memory right? Cython should solve this!"

And it did. Here's my revised benchmark

import numpy as np
cimport numpy as np 
ctypedef np.int_t DTYPE_t 
NUM_ITERATIONS = 200000
def np_array_assignment():
 cdef np.ndarray[DTYPE_t, ndim=1] np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] += 1
 np_array[1] += 1 
def py_array_assignment():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] += 1
 py_array[1] += 1

I redefined the np_array to cdef np.ndarray[DTYPE_t, ndim=1]

print(timeit(py_array_assignment, number=3))
# 0.03459
print(timeit(np_array_assignment, number=3))
# 0.00755

That's with the python function also being optimized by cython. The timing for the python function in pure python is

print(timeit(py_array_assignment, number=3))
# 0.12510

A 17x speedup. Sure it's a silly example, but I thought it was educational.

asked Mar 7, 2014 at 0:38
1
  • I'm guessing here, but I'd say it's the extra work from changing context between C and Python back and forth what is killing the performance here. Commented Mar 7, 2014 at 0:46

2 Answers 2

5

This is not (just) addition which is slow, it is element access overhead, see for example:

def np_array_assignment():
 np_array = np.array([1, 2])
 for x in xrange(NUM_ITERATIONS):
 np_array[0] = 1
 np_array[1] = 1
def py_array_assignment():
 py_array = [1, 2]
 for x in xrange(NUM_ITERATIONS):
 py_array[0] = 1
 py_array[1] = 1
timeit np_array_assignment()
10000 loops, best of 3: 178 us per loop
timeit py_array_assignment()
10000 loops, best of 3: 72.5 us per loop

Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.

Use numpy functions to avoid looping, making operations on the whole array at once, i.e.:

def np_array_addition_good():
 np_array = np.array([1, 2])
 np_array += np.sum(np.arange(NUM_ITERATIONS))

The results comparing your functions with the one above are pretty revealing:

timeit np_array_addition()
1000 loops, best of 3: 1.32 ms per loop
timeit py_array_addition()
10000 loops, best of 3: 101 us per loop
timeit np_array_addition_good()
100000 loops, best of 3: 11 us per loop

But actually, you can do as good with pure python if you collapse the loops:

def py_array_addition_good():
 py_array = [1, 2]
 rangesum = sum(range(NUM_ITERATIONS))
 py_array = [x + rangesum for x in py_array]
timeit py_array_addition_good()
100000 loops, best of 3: 11 us per loop

All in all, with such simple operations there is really no improvement in using numpy. Optimized code in pure python works just as good.


There were a lot of questions about it and I suggest looking at some good answers there:

How do I maximize efficiency with numpy arrays?

numpy float: 10x slower than builtin in arithmetic operations?

answered Mar 7, 2014 at 0:47
0
4

You're not actually using numpy's vectorized array addition if you do the loop in python; there's also the access overhead mentioned by @shashkello.

I took the liberty of increasing the array size a tad, and also adding a vectorized version of the addition:

import numpy as np
from timeit import timeit
NUM_ITERATIONS = 1000
def np_array_addition():
 np_array = np.array(xrange(1000))
 for x in xrange(NUM_ITERATIONS):
 for i in xrange(len(np_array)):
 np_array[i] += x
def np_array_addition2():
 np_array = np.array(xrange(1000))
 for x in xrange(NUM_ITERATIONS):
 np_array += x
def py_array_addition():
 py_array = range(1000)
 for x in xrange(NUM_ITERATIONS):
 for i in xrange(len(py_array)):
 py_array[i] += x
print timeit(np_array_addition, number=3) # 4.216162
print timeit(np_array_addition2, number=3) # 0.117681
print timeit(py_array_addition, number=3) # 0.439957

As you can see, the vectorized numpy version wins pretty handily. The gap will just get larger as array sizes and/or iterations increase.

answered Mar 7, 2014 at 0:55
2
  • Ah, that makes sense. So the vector addition shines through more with larger arrays Commented Mar 7, 2014 at 1:14
  • You should remove the construction from the array loops, or use the numpy construction functions... Commented Mar 7, 2014 at 13:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.