When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.
Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc
here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):
import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)
Here's your sum-of-squares-by-column algorithm:
def sum_squares_by_column_1():
maxv = np.zeros((csc.shape[1]))
for i in range(csc.shape[1]) :
maxv[i] = sum(np.square(csc[:,i].data))
return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942
So how can we get rid of that for
loop? Well, if this were an ordinary Numpy array then we could write:
np.sum(np.square(csc), axis=1)
but this doesn't work here because scipy.sparse.csc_matrix
is a matrix rather than an array (see the answers to this Stack Overflow question this Stack Overflow question) and so np.square
doesn't work. That's because np.square(a)
just multiplies a
by itself (as if you'd written a * a
), and this is matrix multiplication when a
is a matrix.
So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix
documentation to see if there are methods for the operations we need. There's no square
method, but there is a multiply
method that does "point-wise multiplication" and a sum
method that "sums the matrix over the given axis". So, putting these together:
def sum_squares_by_column_2():
return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722
which is about 500 times faster.
When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.
Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc
here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):
import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)
Here's your sum-of-squares-by-column algorithm:
def sum_squares_by_column_1():
maxv = np.zeros((csc.shape[1]))
for i in range(csc.shape[1]) :
maxv[i] = sum(np.square(csc[:,i].data))
return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942
So how can we get rid of that for
loop? Well, if this were an ordinary Numpy array then we could write:
np.sum(np.square(csc), axis=1)
but this doesn't work here because scipy.sparse.csc_matrix
is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square
doesn't work. That's because np.square(a)
just multiplies a
by itself (as if you'd written a * a
), and this is matrix multiplication when a
is a matrix.
So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix
documentation to see if there are methods for the operations we need. There's no square
method, but there is a multiply
method that does "point-wise multiplication" and a sum
method that "sums the matrix over the given axis". So, putting these together:
def sum_squares_by_column_2():
return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722
which is about 500 times faster.
When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.
Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc
here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):
import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)
Here's your sum-of-squares-by-column algorithm:
def sum_squares_by_column_1():
maxv = np.zeros((csc.shape[1]))
for i in range(csc.shape[1]) :
maxv[i] = sum(np.square(csc[:,i].data))
return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942
So how can we get rid of that for
loop? Well, if this were an ordinary Numpy array then we could write:
np.sum(np.square(csc), axis=1)
but this doesn't work here because scipy.sparse.csc_matrix
is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square
doesn't work. That's because np.square(a)
just multiplies a
by itself (as if you'd written a * a
), and this is matrix multiplication when a
is a matrix.
So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix
documentation to see if there are methods for the operations we need. There's no square
method, but there is a multiply
method that does "point-wise multiplication" and a sum
method that "sums the matrix over the given axis". So, putting these together:
def sum_squares_by_column_2():
return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722
which is about 500 times faster.
When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.
Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc
here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):
import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)
Here's your sum-of-squares-by-column algorithm:
def sum_squares_by_column_1():
maxv = np.zeros((csc.shape[1]))
for i in range(csc.shape[1]) :
maxv[i] = sum(np.square(csc[:,i].data))
return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942
So how can we get rid of that for
loop? Well, if this were an ordinary Numpy array then we could write:
np.sum(np.square(csc), axis=1)
but this doesn't work here because scipy.sparse.csc_matrix
is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square
doesn't work. That's because np.square(a)
just multiplies a
by itself (as if you'd written a * a
), and this is matrix multiplication when a
is a matrix.
So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix
documentation to see if there are methods for the operations we need. There's no square
method, but there is a multiply
method that does "point-wise multiplication" and a sum
method that "sums the matrix over the given axis". So, putting these together:
def sum_squares_by_column_2():
return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722
which is about 500 times faster.