Why does a numpy array not appear to be much faster than a standard python list?

Question 1

From what I understand, numpy arrays can handle operations more quickly than python lists because they're handled in a parallel rather than iterative fashion. I tried to test that out for fun, but I didn't see much of a difference.

Was there something wrong with my test? Does the difference only matter with arrays much bigger than the ones I used? I made sure to create a python list and numpy array in each function to cancel out differences creating one vs. the other might make, but the time delta really seems negligible. Here's my code:

My final outputs were numpy function: 6.534756324786595s, list function: 6.559365831783256s

import timeit
import numpy as np
a_setup = 'import timeit; import numpy as np'
std_fx = '''
def operate_on_std_array():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 for index,elem in enumerate(std_arr):
 std_arr[index] = (elem**20)*63134
 return std_arr
'''
parallel_fx = '''
def operate_on_np_arr():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 np_arr = (np_arr**20)*63134
 return np_arr
'''
def operate_on_std_array():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 for index,elem in enumerate(std_arr):
 std_arr[index] = (elem**20)*63134
 return std_arr
def operate_on_np_arr():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 np_arr = (np_arr**20)*63134
 return np_arr
print('std',timeit.timeit(setup = a_setup, stmt = std_fx, number = 80000000))
print('par',timeit.timeit(setup = a_setup, stmt = parallel_fx, number = 80000000))
#operate_on_np_arr()
#operate_on_std_array()

Question 2

My guess is that python has become smart enough to optimize element wise operations as above well enough. Try a matrix multiplication type of operation, and I'd bet the farm you'd see a big difference.

Question 3

The difference is marginal and, in fact, I see numpy faster by a larger degree on my machine.

Question 4

Why does std_fx create np_arr and not use it? Why does parallel_fx use range? WHy not np_arr =np.arange(0,100...)? Your calculated numbers are too large for the array dtype (int32 or int64).

Question 5

Math with numpy is usually faster because it performs the iteration(s) in compiled code. Operations the requires iteration at the Python level are slower. Creating arrays from lists takes time, potentially cancelling out time savings. Sometimes for very large cases, memory management can also chew into the time savings.

Question 6

@hpaulj I created the np_arr in std_fx because I wanted to specifically measure the difference between the application of the math function across the array. I wasn't interested in seeing whether creating lists/np arrays was faster.

Question 7

The timeit docs here show that the statement you pass in is supposed to execute something, but the statements you pass in just define functions. I was thinking 80000000 trials on a 1-million-length array should take much longer.

Other issues you have in your test:

np_arr = (np_arr**20)*63134 may create a copy of np_arr, but your Python list equivalent only mutates an existing array.
Numpy math is different than Python math. 100**20 in Python returns a huge number because Python has unbounded-length integers, but Numpy uses C-style fixed-length integers that overflow. (In general, you have to imagine doing the operation in C when you use Numpy because other unintuitive things may apply, like garbage in uninitialized arrays.)

Here's a test where I modify both in place, multiplying then dividing by 31 each time so the values don't change over time or overflow:

import numpy as np
import timeit
std_arr = list(range(0,100000))
np_arr = np.array(std_arr)
np_arr_vec = np.vectorize(lambda n: (n * 31) / 31)
def operate_on_std_array():
 for index,elem in enumerate(std_arr):
 std_arr[index] = elem * 31
 std_arr[index] = elem / 31
 return std_arr
def operate_on_np_arr():
 np_arr_vec(np_arr)
 return np_arr
import time
def test_time(f):
 count = 100
 start = time.time()
 for i in range(count):
 f()
 dur = time.time() - start
 return dur
print(test_time(operate_on_std_array))
print(test_time(operate_on_np_arr))

Results:

3.0798873901367188 # standard array time
2.221336841583252 # np array time

Edit: As @user2357112 pointed out, the proper Numpy way to do it is this:

def operate_on_np_arr():
 global np_arr
 np_arr *= 31
 np_arr //= 31 # integer division, not double
 return np_arr

Makes it much faster. I see 0.1248 seconds.

Question 8

np.vectorize is basically just a wrapper around a for loop. It only exists for convenience, not efficiency.

Question 9

If you want to avoid the copy, use np_arr *= 31 and np_arr //= 31 instead of working element by element.

Question 10

I was getting weird errors trying to do that. TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced... Could be my version of Numpy.

Question 11

Ohh, //= does integer division. I thought that was a typo. My mistake, retrying...

Question 12

This is fantastic. Glad to see I was doing something wrong.

Question 13

Here are some timings using the ipython magic to initialize lists and or arrays. The results should focus on the calculations:

In [103]: %%timeit alist = list(range(10000))
 ...: for i,e in enumerate(alist):
 ...: alist[i] = (e*3)*20
 ...: 
4.13 ms ± 146 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [104]: %%timeit arr = np.arange(10000)
 ...: z = (arr*3)*20
 ...: 
20.6 μs ± 439 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [105]: %%timeit alist = list(range(10000))
 ...: z = [(e*3)*20 for e in alist]
 ...: 
 ...: 
1.71 ms ± 2.69 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Looking at the effect of array creation times:

In [106]: %%timeit alist = list(range(10000))
 ...: arr = np.array(alist)
 ...: z = (arr*3)*20
 ...: 
 ...: 
1.01 ms ± 43.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Ok, the calculation isn't the same. If I use **3 instead, all times are about 2x larger. Same relative relations.

sudo sudo 5,8626 gold badges46 silver badges81 bronze badges · Accepted Answer · 2018-04-10 18:18:02Z

The timeit docs here show that the statement you pass in is supposed to execute something, but the statements you pass in just define functions. I was thinking 80000000 trials on a 1-million-length array should take much longer.

Other issues you have in your test:

np_arr = (np_arr**20)*63134 may create a copy of np_arr, but your Python list equivalent only mutates an existing array.
Numpy math is different than Python math. 100**20 in Python returns a huge number because Python has unbounded-length integers, but Numpy uses C-style fixed-length integers that overflow. (In general, you have to imagine doing the operation in C when you use Numpy because other unintuitive things may apply, like garbage in uninitialized arrays.)

Here's a test where I modify both in place, multiplying then dividing by 31 each time so the values don't change over time or overflow:

import numpy as np
import timeit
std_arr = list(range(0,100000))
np_arr = np.array(std_arr)
np_arr_vec = np.vectorize(lambda n: (n * 31) / 31)
def operate_on_std_array():
 for index,elem in enumerate(std_arr):
 std_arr[index] = elem * 31
 std_arr[index] = elem / 31
 return std_arr
def operate_on_np_arr():
 np_arr_vec(np_arr)
 return np_arr
import time
def test_time(f):
 count = 100
 start = time.time()
 for i in range(count):
 f()
 dur = time.time() - start
 return dur
print(test_time(operate_on_std_array))
print(test_time(operate_on_np_arr))

Results:

3.0798873901367188 # standard array time
2.221336841583252 # np array time

Edit: As @user2357112 pointed out, the proper Numpy way to do it is this:

def operate_on_np_arr():
 global np_arr
 np_arr *= 31
 np_arr //= 31 # integer division, not double
 return np_arr

Makes it much faster. I see 0.1248 seconds.

np.vectorize is basically just a wrapper around a for loop. It only exists for convenience, not efficiency.
If you want to avoid the copy, use np_arr *= 31 and np_arr //= 31 instead of working element by element.
I was getting weird errors trying to do that. TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced... Could be my version of Numpy.
Ohh, //= does integer division. I thought that was a typo. My mistake, retrying...

CollectivesTM on Stack Overflow

Why does a numpy array not appear to be much faster than a standard python list?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related