Recently I noticed that the idiomatic python two-dimensional array initialisation is extremely slow. I am wondering, is there a good, proper way of doing this simple task fast? Also, are those two variants equivalent?
Here are some code snippets using timeit
import timeit
A = 5000
B = 7000
N = 10
def list_comprehension_xrange():
matrix = [[0 for j in xrange(A)] for i in xrange(B)]
def list_comprehension_range():
matrix = [[0 for j in range(A)] for i in range(B)]
def multiplication():
matrix = [[0] * A] * B
print "list_comprehension_xrange:", timeit.timeit(list_comprehension_xrange, number=N)
print "list_comprehension_range:", timeit.timeit(list_comprehension_range, number=N)
print "multiplication:", timeit.timeit(multiplication, number=N)
list_comprehension_xrange: 11.4952278137
list_comprehension_range: 13.5112810135
multiplication: 0.00100612640381
2 Answers 2
First off don't use multiplication
.
It's creates 2 lists, not 7001.
To better show this, hop into IDLE:
>>> a = [[0] * 3] * 3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[0][0] = 1
>>> a
[[1, 0, 0], [1, 0, 0], [1, 0, 0]]
No this is not what you want.
In both your other functions you should use _
,
this is a convention that says you don't use the result, it's thrown away.
This results in:
def list_comprehension_xrange():
matrix = [[0 for _ in xrange(A)] for _ in xrange(B)]
If speed is highly important, then you can use a hybrid of multiplication and range
or xrange
.
From the benchmarks I'd lean towards xrange
, but I don't think there is too much difference.
I can't say exactly why these are so fast, it may be because the multiplication doesn't build and destroy an intermarry list. Or it's not creating a list in Python code but in C. Or it's not building ~A*B amount of objects only ~B amount. But I don't.
I added the following functions, with timers:
def multiplication_range():
matrix = [[0] * A for _ in range(B)]
def multiplication_xrange():
matrix = [[0] * A for _ in xrange(B)]
And got the following results:
list_comprehension_xrange: 23.0122457496
list_comprehension_range: 24.9418833563
multiplication: 0.00104575910762
multiplication_range: 3.35667382897
multiplication_xrange: 3.35528033683
-
\$\begingroup\$ thank you. your hybrid approach is the fastest option. I did not know that a nested list comprehension is so slow. \$\endgroup\$vegi– vegi2016年05月04日 14:27:13 +00:00Commented May 4, 2016 at 14:27
-
\$\begingroup\$ @vegi Like an iterative approach (using
append
, except the loop is done in C not in Python), list comprehensions can not know how many items will be in the list at the end. So large list will be reallocated several times. Using the multiplication, the number of items is known upfront, thus only one alloc (and possibly only onememset
too) speeding the overall thing. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年05月05日 07:38:10 +00:00Commented May 5, 2016 at 7:38 -
\$\begingroup\$ @MathiasEttinger You are right in general but keep in mind that python lists are not re-allocated. The list creates new chunks of allocated memory. The time complexity of the insert is O(1). What bugs me is that range generates a vector of values with a known size (in contrast to xrange). The python interpreter has access to all information it needs to execute the statement effectively. \$\endgroup\$vegi– vegi2016年05月05日 09:39:39 +00:00Commented May 5, 2016 at 9:39
-
\$\begingroup\$ @vegi they don't use chunks. Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead. \$\endgroup\$2016年05月05日 09:43:20 +00:00Commented May 5, 2016 at 9:43
-
\$\begingroup\$ @JoeWallis is it equivalent to the the cpp std::vector? \$\endgroup\$vegi– vegi2016年05月05日 09:54:43 +00:00Commented May 5, 2016 at 9:54
Now that you are asking about speed from 2+ dimensional arrays you are now out of standard python land and in to lower level array territory. Numpy and Pandas are libraries that you need to become familiar with for this task.
Do not plan on using "for" statements with these libraries. (they work but you will lose speed)
Numpy creates an array of some data type
Pandas is like having R or a spreadsheet in Python.
Since your numbers are all integers I'll pick Numpy.
import numpy as np
a = 5000
b = 7000
%timeit np.zeroes((a,b))
100000 loops, best of 3: 2.41 μs per loop
and this works with python 2 and 3.
As for matrix multiplication you can multiply 2 arrays or multiply and array * scalars.
a = np.ones((a,b))
b = 5
%timeit a*b
10 loops, best of 3: 148 ms per loop
You had a example of 1's in an axis and that can be done like this with numpy
b = 7000
c = np.zeros((a,b))
c[:,0] = 1
array([[ 1., 0., 0., 0., 0., 0.,...
[ 1., 0., 0., 0., 0., 0., ,,,
....
Explore related questions
See similar questions with these tags.
multiplication
case creates an outer list filled with 7000 references to a single list of 5000 items, rather than 7000 lists of 5000 items each, so its unsurprising that it is 3-4 orders of magnitude faster. \$\endgroup\$