Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.

Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):

import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)

Here's your sum-of-squares-by-column algorithm:

def sum_squares_by_column_1():
 maxv = np.zeros((csc.shape[1]))
 for i in range(csc.shape[1]) :
 maxv[i] = sum(np.square(csc[:,i].data))
 return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942

So how can we get rid of that for loop? Well, if this were an ordinary Numpy array then we could write:

np.sum(np.square(csc), axis=1)

but this doesn't work here because scipy.sparse.csc_matrix is a matrix rather than an array (see the answers to this Stack Overflow question this Stack Overflow question) and so np.square doesn't work. That's because np.square(a) just multiplies a by itself (as if you'd written a * a), and this is matrix multiplication when a is a matrix.

So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix documentation to see if there are methods for the operations we need. There's no square method, but there is a multiply method that does "point-wise multiplication" and a sum method that "sums the matrix over the given axis". So, putting these together:

def sum_squares_by_column_2():
 return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722

which is about 500 times faster.

When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.

Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):

import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)

Here's your sum-of-squares-by-column algorithm:

def sum_squares_by_column_1():
 maxv = np.zeros((csc.shape[1]))
 for i in range(csc.shape[1]) :
 maxv[i] = sum(np.square(csc[:,i].data))
 return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942

So how can we get rid of that for loop? Well, if this were an ordinary Numpy array then we could write:

np.sum(np.square(csc), axis=1)

but this doesn't work here because scipy.sparse.csc_matrix is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square doesn't work. That's because np.square(a) just multiplies a by itself (as if you'd written a * a), and this is matrix multiplication when a is a matrix.

So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix documentation to see if there are methods for the operations we need. There's no square method, but there is a multiply method that does "point-wise multiplication" and a sum method that "sums the matrix over the given axis". So, putting these together:

def sum_squares_by_column_2():
 return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722

which is about 500 times faster.

When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.

Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):

import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)

Here's your sum-of-squares-by-column algorithm:

def sum_squares_by_column_1():
 maxv = np.zeros((csc.shape[1]))
 for i in range(csc.shape[1]) :
 maxv[i] = sum(np.square(csc[:,i].data))
 return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942

So how can we get rid of that for loop? Well, if this were an ordinary Numpy array then we could write:

np.sum(np.square(csc), axis=1)

but this doesn't work here because scipy.sparse.csc_matrix is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square doesn't work. That's because np.square(a) just multiplies a by itself (as if you'd written a * a), and this is matrix multiplication when a is a matrix.

So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix documentation to see if there are methods for the operations we need. There's no square method, but there is a multiply method that does "point-wise multiplication" and a sum method that "sums the matrix over the given axis". So, putting these together:

def sum_squares_by_column_2():
 return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722

which is about 500 times faster.

Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

When trying to speed up Numpy code, it's important to scrutinize every loop. A loop that has to run in the Python interpreter can be hundreds of times slower than a vectorized loop running inside Numpy.

Also, when trying to improve the performance of code, there's no substitute for measurement. So let's set up a test case. The matrix csc here has the same density as yours, but is one-hundredth the size (to make the runtimes tractable):

import numpy as np
from scipy.sparse import csc_matrix
from timeit import timeit
shape = 40000, 50000
density = 0.0005
n = np.prod(shape) * density
rowi, coli = [np.random.randint(s, size=n) for s in shape]
csc = csc_matrix((np.random.rand(n), (rowi, coli)), shape=shape)

Here's your sum-of-squares-by-column algorithm:

def sum_squares_by_column_1():
 maxv = np.zeros((csc.shape[1]))
 for i in range(csc.shape[1]) :
 maxv[i] = sum(np.square(csc[:,i].data))
 return maxv
>>> timeit(sum_squares_by_column_1, number=1)
19.718280024942942

So how can we get rid of that for loop? Well, if this were an ordinary Numpy array then we could write:

np.sum(np.square(csc), axis=1)

but this doesn't work here because scipy.sparse.csc_matrix is a matrix rather than an array (see the answers to this Stack Overflow question) and so np.square doesn't work. That's because np.square(a) just multiplies a by itself (as if you'd written a * a), and this is matrix multiplication when a is a matrix.

So what we need to do instead is to read carefully through the scipy.sparse.csc_matrix documentation to see if there are methods for the operations we need. There's no square method, but there is a multiply method that does "point-wise multiplication" and a sum method that "sums the matrix over the given axis". So, putting these together:

def sum_squares_by_column_2():
 return csc.multiply(csc).sum(1)
>>> timeit(sum_squares_by_column_2, number=1)
0.04036429896950722

which is about 500 times faster.

lang-py

AltStyle によって変換されたページ (->オリジナル) /