numpy-
arr = np.array([[1, 2, 3, 4]])
row = np.array([1, 2, 3, 4])
%timeit arr[0] = row
466 ns ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
python list -
arr = [[1, 2, 3, 4]]
row = [1, 2, 3, 4]
%timeit arr[0] = row
59.3 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each
Shouldn't numpy be the faster version here?
Here's what I'm aiming to get done -
arr = np.empty((150, 4))
while True:
row = get_row_from_api()
arr[-1] = row
2 Answers 2
Yep, using python lists this way would definitely be faster, because when you assign something to a python list element, it's not copied, just some references are reassigned (https://developers.google.com/edu/python/lists). Numpy instead copies all the elements from the source container to the target one. I'm not sure whether you need numpy arrays here, because their creation is not free and python lists aren't that slow at creation (and as we see at assignment as well).
-
Ah, there's a bit of code down the chain that uses numpy, not sure if its worth doing the conversion from list -> numpy everytimeDev Aggarwal– Dev Aggarwal2020年09月17日 11:54:04 +00:00Commented Sep 17, 2020 at 11:54
-
I think it's better to check the whole sequence of operations for its performance in total. It's difficult to say a priori what's going to work faster (unless you're not multiplying [1000, 1000] matrices or doing something like that).Michał Jaworski– Michał Jaworski2020年09月17日 11:57:38 +00:00Commented Sep 17, 2020 at 11:57
-
@DevAggarwal Maybe time the whole chain instead of just the numpy versus list snipped and see what works best. Numpy should be faster on calculations. How complex are yours down the chain?Guimoute– Guimoute2020年09月17日 12:00:55 +00:00Commented Sep 17, 2020 at 12:00
The underlying semantics of the two operations are very different. Python lists are arrays of references. Numpy arrays are arrays of the data itself.
The line row = get_row_from_api()
implies that a fresh list has already been allocated.
Assigning to a list as lst[-1] = row
just writes an address into lst
. That's generally 4 or 8 bytes.
Placing in an array as arr[i] = row
is copying data. It's a shorthand for arr[i, :] = row
. Every element of row
gets copied to the buffer of arr
. If row
was a list, that incurs additional overhead to convert from python objects to native numerical types.
Remember that premature optimization is pointless. Your time savings for one method vs the other are likely to be negligible. At the same time, if you need an array later down the line anyway, it's likely faster to pre-allocate and take a small speed hit rather than calling np.array
on the final list. In the former case, you allocate a buffer of predetermined size and dtype. In the latter, you've merely deferred the overhead of copying the data, but also incurred the overhead of having to figure out the array size and dtype.
numpy.array
example, it is assigning 4 items. To get a fair comparison, you'd need something likefor i, x in enumerate(row): arr[0][i] = x
numpy.ndarray
object withdtype=object
, but then you are essentially working with a bad list. So at that point, just use alist
instead of anumpy.ndarray
list
and anumpy.ndarray
. And it isn't immediately obvious exactly what you are doing with your numpy array.arr[-1] = row
repeatedly assigns a value to the same last row ofarr
. It does not 'step-through' the rows,