0

I have always been told that python's native append is a slow function and should be avoided in for loops. However after a couple of small tests I have found it performs more poorly than a numpy array when iterating over it with a for loop:


First Test -array/list construction

python native list append

def pythonAppend(n):
 x = []
 for i in range(n):
 x.append(i)
 return x
%timeit pythonAppend(1000000)

Numpy allocate array then access

 def numpyConstruct(n):
 x = np.zeros(n)
 for i in range(n):
 x[i] = i
 return x
%timeit numpyConstruct(1000000)

Results:

Python Time: 179 ms

Numpy Time: 189 ms


Second Test -Accesing Elements

n = 1000000
x = pythonAppend(n)
arr = numpyConstruct(n)
order = arr[:]; np.random.shuffle(order); order = list(order.astype(int))
def listAccess(x):
 for i in range(len(x)):
 x[i] = i/3.
 return x
def listAccessOrder(x, order):
 for i in order:
 x[i] = i/3.
 return x
%timeit listAccess(x)
%timeit listAccess(arr)
%timeit listAccessOrder(x, order)
%timeit listAccessOrder(arr, order)
%timeit arr / 3.

Results

python -sequential: 175 ms

numpy -sequential: 198 ms

python -shuffled access: 2.08 s

numpy -shuffled access: 2.15 s

numpy -vectorised access: 1.92ms



These results to me are quite surprising as I would have thought that at least due to python being linked lists accessing elements would be slower than numpy due to having to follow a chain of pointers. Or am I misunderstanding the python list implementation? Also why do the python lists perform slightly better than the numpy equivalents -I guess most of the inefficiency comes from using a python for loop but python's append still out-competes numpy accessing and allocating.

asked Nov 28, 2019 at 12:17
3
  • I guess when you loose numpy in a loop you loss all the performance advantages. But if you can write it as an vector function its going to be faster then looping with clean python. Commented Nov 28, 2019 at 12:32
  • I thought numpy arrays were sequential? So accessing their elements (especially in a random order) would be faster as the CPU can grab the chunk of memory straight away rather than having to follow a sequence of pointers to the element? Commented Nov 28, 2019 at 12:44
  • python lists are not linked lists. Commented Nov 28, 2019 at 14:44

1 Answer 1

2

Python lists are not linked lists. The data buffer of the list object contains links/pointers to objects (else where in memory). So fetching the ith element is easy. The buffer also has growth headroom, so append is a simple matter of inserting the new element's link. The buffer has to be reallocated periodically, but Python manages that seamlessly.

A numpy array also has a 1d data buffer, but it contains numeric values (more generally the bytes as required by the dtype). Fetching is easy, but it has to create a new python object to 'contain' the value ('unboxing'). Assignment also requires conversion from the python object to bytes that will be stored.

Generally we've found that making a new array by appending to a list (with one np.array call at the end) is competitive with assignment to a preallocated array.

Iteration through a numpy array is normally slower than iterating through a list.

What we strongly discourage is using np.append (or some variant) to grow an array iteratively. That makes a new array each time, with full copy.

numpy arrays are fast when the iteration is done in the compiled code. Normally that's with the whole-array methods. The c code can iterate over the data buffer, operating on the values directly, rather than invoking python object methods at each step.

answered Nov 28, 2019 at 17:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.