Optimize Scipy Sparse Matrix Factorization code for SGD

Question 1

I'm working on implementing the stochastic gradient descent algorithm for recommender systems (see Funk) using sparse matrices with Scipy.

This is how a first basic implementation looks like:

N = self.model.shape[0] #no of users
M = self.model.shape[1] #no of items
self.p = np.random.rand(N, K)
self.q = np.random.rand(M, K)
rows,cols = self.model.nonzero() 
for step in xrange(steps):
 for u, i in zip(rows,cols):
 e=self.model-np.dot(self.p,self.q.T) #calculate error for gradient
 p_temp = learning_rate * ( e[u,i] * self.q[i,:] - regularization * self.p[u,:])
 self.q[i,:]+= learning_rate * ( e[u,i] * self.p[u,:] - regularization * self.q[i,:])
 self.p[u,:] += p_temp

I have two questions and I am hoping some seasoned python coders / recommender systems experts here could provide me with some insight into this:

1) How often should the error e be adjusted? It is not clear from Funk's and Koren's papers if it should be once for every rating or if it should be once per every factor. Meaning, should i take it out of the for loop, leave it there, or put it even deeper (over the k iterations) ?

2) Unfortunately, my code is pretty slow, taking about 20 minutes on ONE iteration over a 100k ratings dataset. This means that computing the factorization over the whole dataset takes about 10 hours. I was thinking that this is probably due to the sparse matrix for loop. I've tried expressing the q and p changes using fancy indexing but since I'm still pretty new at scipy and numpy, I couldn't figure a better way to do it. Do you have any pointers on how i could avoid iterating over the rows and columns of the sparse matrix explicitly?

Question 2

Which matrix is the scipy sparse matrix?

Question 3

There may be other optimizations available, but this one alone should go a long way...

When you do e = self.model - np.dot(self.p,self.q.T), the resulting e is a dense matrix the same size as your model. You later only use e[u, i] in your loop, so you are throwing away what probably are millions of computed values. If you simply replace that line with:

e = self.model[u, i] - np.dot(self.p[u, :], self.q[i, :])

and then replace e[u, i] with e, you will spare yourself a huge amount both of computations and memory accesses, which should boost performance by a lot.

Question 4

Thank you. That's a great improvement. I also did some other improvements. I transposed q from the very beginning and called the column directly, instead of calling and transposing the row. I also integrated the learning rate into the error , because it was being called three times. All in all, I manage to get one step to take only about 10 seconds now! Do you think it's worth optimising further with Cython?

Question 5

Yes, it almost cetainly is... The problem is that you are updating your p and q vectors as you progress over the non-zero entries of model. This makes it very hard to get rid of the for loops, because p and q are different for every entry of model. If you could delay updating of p and q until the full model has been processed, then you would get different results, but you could speed things up a lot without Cython or C. With your algorithm, I cannot think of any way around that...

Question 6

Thank you for the insight. Time to learn myself some Cython then! I'll be tackling the Netflix dataset soon, so I need all the speed I can get. I'm going to mark this as answered then.

Question 7

Besides the recommendation that Jaime offered, it would be very easy to get performance improvements by using Numba.

There is a tutorial on Numba and matrix factorization here. When Numba works, it requires much less effort than rewriting your code in Cython. In a nutshell, you need to import numba:

from numba.decorators import jit

And then add a decorator to your matrix factorization function, by simply adding this:

@jit

Before the method definition.

Jaime JaimeJaime 6,62721 silver badges30 bronze badges · Accepted Answer · 2013-11-19 23:52:32Z

5

\$\begingroup\$

There may be other optimizations available, but this one alone should go a long way...

When you do e = self.model - np.dot(self.p,self.q.T), the resulting e is a dense matrix the same size as your model. You later only use e[u, i] in your loop, so you are throwing away what probably are millions of computed values. If you simply replace that line with:

e = self.model[u, i] - np.dot(self.p[u, :], self.q[i, :])

and then replace e[u, i] with e, you will spare yourself a huge amount both of computations and memory accesses, which should boost performance by a lot.

Share

edited Nov 20, 2013 at 5:18

answered Nov 19, 2013 at 23:52

Jaime's user avatar

Jaime JaimeJaime

6,62721 silver badges30 bronze badges

\$\endgroup\$

3

\$\begingroup\$ Thank you. That's a great improvement. I also did some other improvements. I transposed q from the very beginning and called the column directly, instead of calling and transposing the row. I also integrated the learning rate into the error , because it was being called three times. All in all, I manage to get one step to take only about 10 seconds now! Do you think it's worth optimising further with Cython? \$\endgroup\$

Ana Todor
– Ana Todor

2013年11月20日 16:21:02 +00:00
Commented Nov 20, 2013 at 16:21
\$\begingroup\$ Yes, it almost cetainly is... The problem is that you are updating your p and q vectors as you progress over the non-zero entries of model. This makes it very hard to get rid of the for loops, because p and q are different for every entry of model. If you could delay updating of p and q until the full model has been processed, then you would get different results, but you could speed things up a lot without Cython or C. With your algorithm, I cannot think of any way around that... \$\endgroup\$

Jaime
– Jaime

2013年11月20日 16:33:00 +00:00
Commented Nov 20, 2013 at 16:33
\$\begingroup\$ Thank you for the insight. Time to learn myself some Cython then! I'll be tackling the Netflix dataset soon, so I need all the speed I can get. I'm going to mark this as answered then. \$\endgroup\$

Ana Todor
– Ana Todor

2013年11月20日 16:45:10 +00:00
Commented Nov 20, 2013 at 16:45

Add a comment |

Stack Exchange Network

Optimize Scipy Sparse Matrix Factorization code for SGD

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize Scipy Sparse Matrix Factorization code for SGD

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions