4

From what I understand, numpy arrays can handle operations more quickly than python lists because they're handled in a parallel rather than iterative fashion. I tried to test that out for fun, but I didn't see much of a difference.

Was there something wrong with my test? Does the difference only matter with arrays much bigger than the ones I used? I made sure to create a python list and numpy array in each function to cancel out differences creating one vs. the other might make, but the time delta really seems negligible. Here's my code:

My final outputs were numpy function: 6.534756324786595s, list function: 6.559365831783256s

import timeit
import numpy as np
a_setup = 'import timeit; import numpy as np'
std_fx = '''
def operate_on_std_array():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 for index,elem in enumerate(std_arr):
 std_arr[index] = (elem**20)*63134
 return std_arr
'''
parallel_fx = '''
def operate_on_np_arr():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 np_arr = (np_arr**20)*63134
 return np_arr
'''
def operate_on_std_array():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 for index,elem in enumerate(std_arr):
 std_arr[index] = (elem**20)*63134
 return std_arr
def operate_on_np_arr():
 std_arr = list(range(0,1000000))
 np_arr = np.asarray(std_arr)
 np_arr = (np_arr**20)*63134
 return np_arr
print('std',timeit.timeit(setup = a_setup, stmt = std_fx, number = 80000000))
print('par',timeit.timeit(setup = a_setup, stmt = parallel_fx, number = 80000000))
#operate_on_np_arr()
#operate_on_std_array()
asked Apr 10, 2018 at 16:24
7
  • 1
    My guess is that python has become smart enough to optimize element wise operations as above well enough. Try a matrix multiplication type of operation, and I'd bet the farm you'd see a big difference. Commented Apr 10, 2018 at 16:27
  • 1
    The difference is marginal and, in fact, I see numpy faster by a larger degree on my machine. Commented Apr 10, 2018 at 16:29
  • Why does std_fx create np_arr and not use it? Why does parallel_fx use range? WHy not np_arr =np.arange(0,100...)? Your calculated numbers are too large for the array dtype (int32 or int64). Commented Apr 10, 2018 at 16:56
  • Math with numpy is usually faster because it performs the iteration(s) in compiled code. Operations the requires iteration at the Python level are slower. Creating arrays from lists takes time, potentially cancelling out time savings. Sometimes for very large cases, memory management can also chew into the time savings. Commented Apr 10, 2018 at 17:00
  • @hpaulj I created the np_arr in std_fx because I wanted to specifically measure the difference between the application of the math function across the array. I wasn't interested in seeing whether creating lists/np arrays was faster. Commented Apr 10, 2018 at 17:24

2 Answers 2

4

The timeit docs here show that the statement you pass in is supposed to execute something, but the statements you pass in just define functions. I was thinking 80000000 trials on a 1-million-length array should take much longer.

Other issues you have in your test:

  • np_arr = (np_arr**20)*63134 may create a copy of np_arr, but your Python list equivalent only mutates an existing array.
  • Numpy math is different than Python math. 100**20 in Python returns a huge number because Python has unbounded-length integers, but Numpy uses C-style fixed-length integers that overflow. (In general, you have to imagine doing the operation in C when you use Numpy because other unintuitive things may apply, like garbage in uninitialized arrays.)

Here's a test where I modify both in place, multiplying then dividing by 31 each time so the values don't change over time or overflow:

import numpy as np
import timeit
std_arr = list(range(0,100000))
np_arr = np.array(std_arr)
np_arr_vec = np.vectorize(lambda n: (n * 31) / 31)
def operate_on_std_array():
 for index,elem in enumerate(std_arr):
 std_arr[index] = elem * 31
 std_arr[index] = elem / 31
 return std_arr
def operate_on_np_arr():
 np_arr_vec(np_arr)
 return np_arr
import time
def test_time(f):
 count = 100
 start = time.time()
 for i in range(count):
 f()
 dur = time.time() - start
 return dur
print(test_time(operate_on_std_array))
print(test_time(operate_on_np_arr))

Results:

3.0798873901367188 # standard array time
2.221336841583252 # np array time

Edit: As @user2357112 pointed out, the proper Numpy way to do it is this:

def operate_on_np_arr():
 global np_arr
 np_arr *= 31
 np_arr //= 31 # integer division, not double
 return np_arr

Makes it much faster. I see 0.1248 seconds.

answered Apr 10, 2018 at 18:18
5
  • 2
    np.vectorize is basically just a wrapper around a for loop. It only exists for convenience, not efficiency. Commented Apr 10, 2018 at 18:21
  • 2
    If you want to avoid the copy, use np_arr *= 31 and np_arr //= 31 instead of working element by element. Commented Apr 10, 2018 at 18:23
  • I was getting weird errors trying to do that. TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced... Could be my version of Numpy. Commented Apr 10, 2018 at 18:24
  • Ohh, //= does integer division. I thought that was a typo. My mistake, retrying... Commented Apr 10, 2018 at 18:24
  • This is fantastic. Glad to see I was doing something wrong. Commented Apr 10, 2018 at 20:09
1

Here are some timings using the ipython magic to initialize lists and or arrays. The results should focus on the calculations:

In [103]: %%timeit alist = list(range(10000))
 ...: for i,e in enumerate(alist):
 ...: alist[i] = (e*3)*20
 ...: 
4.13 ms ± 146 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [104]: %%timeit arr = np.arange(10000)
 ...: z = (arr*3)*20
 ...: 
20.6 μs ± 439 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [105]: %%timeit alist = list(range(10000))
 ...: z = [(e*3)*20 for e in alist]
 ...: 
 ...: 
1.71 ms ± 2.69 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Looking at the effect of array creation times:

In [106]: %%timeit alist = list(range(10000))
 ...: arr = np.array(alist)
 ...: z = (arr*3)*20
 ...: 
 ...: 
1.01 ms ± 43.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Ok, the calculation isn't the same. If I use **3 instead, all times are about 2x larger. Same relative relations.

answered Apr 10, 2018 at 18:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.