Matrix Factorisation class packaging methods for factorisation of explicit & implicit data matrices using Gradient Descent, SGD and ALS

Question 1

Attached below, and also as this GitHub gist, is code for a Python class I wrote as part of a personal learning/portfolio project on collaborative-filtering recommender systems via matrix factorisation. The full Jupyter notebook can be found here for greater context. I am looking to review my code for the class furnished.

The class docstring describes a summary of the overarching input-output goals, and I hope it shall be sufficient. The input matrix to be factorised is expected to be provided post data processing from some dataset. The datasets used in my notebook contain 100k (for development) and 10Million (aimed for) user feedback datapoints for movies, provided by the Grouplens research group.

The class implements Gradient Descent and Stochastic Gradient Descent methods for factorisation of matrices interpreted as containing explicit feedback data, and the Alternating Least Squares method with scipy.sparse.linalg.spsolve (LU decomposition) and scipy.sparse.linalg.cg (conjugate gradient) solvers for data interpreted as implicit feedback. It also provides functionality to plot learning curves.

Personal aims for the code

I'm a master's student in physics, looking to pivot to machine learning jobs, and my aims with the project were threefold—writing optimised code, exercising good program design, and machine learning practices.

The first of these aims was to grok the most fundamental methods to perform matrix factorisation, and optimise their implementation to my best, so as to work even for large datasets (in the millions of data points). To this end, I've used sparse matrices almost everywhere possible, along with specialised logic to reduce dense computations, as the resulting factors are eventually expected to be dense.

Secondly, I have little experience actually writing Object Oriented code, even though I've known the principles for long, even in languages like C++. The written class was hence also an exercise in trying to write code with as good a design as I could manage. The code is inspired, and influenced by many internet resources, but is at its core written from scratch in its logical consistency. I realise this is just one class, and not much in the name of OOP, but I'm lately thinking of ways to refactor this monolithic class by producing its derivations. Feedback on the same will be appreciated.

If I may be so ambitious, in future I wish to enhance the class' functionality (along with more code) to be publishable as a small, general-purpose matrix factorisation library installable through pip. For inspiration, the library implicit implements recommendations for implicit datasets, while I wish to package functionality for both explicit and implicit datasets, especially focussing on the matrix factorisation aspect. Even if not realisable due to realistic time constraints, the point is that this ambition informs my aims for learning to write good code.

To the above three ends mentioned, and to other aspects of the code I fail to judge—I'm open to critique, with great regards to the reviewers.

Although outside the scope of this site, a small comment on the quality of this code from an employability perspective would be highly appreciated.

from scipy import sparse
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
class MatrixFactorisation():
 """
 A class implementing sparsity optimised methods to factorise a provided (m*n) sparse matrix into two factor matrices of dimensions (m*k) and (n*k), with 'k' specified; designed with collaborative-filtering recommender system applications in mind.
 For a matrix R, we compute the decomposition matrices P, Q: R{m*n} = P{m*k}•Q^T{k*n}
 The class object is initialised with a sparse input matrix, and the value of 'k'. Even if the data is dense, consider casting it into a sparse format.
 Class contains methods to factorise R, considering it either as explicit or implicit data, i.e. considering only the non-zero values for 
 learning, or considering the entire dense version of the matrix, including zeros, respectively.
 For a more informative discussion on explicit and implicit interpretations of matrix data, refer to http://yifanhu.net/PUB/cf.pdf
 If matrix R represents feedback values (ratings, view counts etc.) for "items" each described by a column of R, provided by "users" each described by a row of R,
 then the factors P, Q represent vector embeddings of "user" and "item" profiles in a common k-dimensional embedding space.
 The class also implements a method to plot the training and validation learning curves.
 Methods:
 - gradient_descent
 - sgd
 - als
 - plot_learning_curve
 """
 def __init__(self, R:sparse.csc_matrix, k:int, verbose:bool=False ):
 self.R = R # R is the "ratings" matrix to be factorised
 self.k = k # hyperparameter k is the number of latent factors desired in the factorised embeddings
 self.m, self.n = R.shape
 self.P = np.random.rand(self.m, k) # initialising the first factor embedding (of "users")
 self.Q = np.random.rand(self.n, k) # initialising the second factor embedding (of "items")
 self.p_nnz_idx, self.q_nnz_idx = R.nonzero() # row and col indices in R matrix respectively, of non-zero elements
 self.train_mse_at_epochs = []
 self.test_mse_at_epochs = []
 self.verbose = verbose
 self.verbose_epoch_intvl = 50
 def __predict_observed(self):
 """
 Returns predicted values as a sparse matrix, corresponding to indices of non-zero elements in matrix R to be factorised
 Avoids constructing a dense prediction matrix produced via naïve dense matrix multiplication of factors.
 Instead, produces predicted values by multiplying only the requisite factor-matrix rows corresponding to non-zero entries in R.
 """
 R_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries: explicit MF
 R_predicted = sparse.csc_matrix( (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
 return R_predicted
 def gradients(self, _lambda):
 """
 Returns l2 gradients w.r.t the elements of the factor matrices, along with the prediction errors vis-a-vis actual values.
 """
 R_predicted = self.__predict_observed()
 error = self.R - R_predicted
 norm_factor = -2 / self.R.nnz #this factor is crucial! Can't be 1 (i.e. we can't leave the gradients unnormalised, else cost diverges!)
 grad_P = norm_factor * (error*self.Q) + 2*(_lambda*self.P)
 grad_Q = norm_factor * (error.T*self.P) + 2*(_lambda*self.Q)
 return grad_P, grad_Q, error
 def gradient_descent(self, iterations=2000, learning_rate = 0.3, _lambda=0.0002, beta=0.8, validation:sparse.csc_matrix=None, verbose=False):
 """
 Returns factor matrices for explicit data matrix by learning their entries via full-batch gradient descent.
 Returned factor matrices P,Q represent user and item embeddings, for explicit ratings data R.
 R = P.Q^T
 Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
 """
 self.verbose = verbose
 grad_P, grad_Q, error = self.gradients(_lambda)
 v_P = grad_P
 v_Q = grad_Q
 beta = beta # momentum constant
 train_mse = error.power(2).sum()/self.R.nnz #self.R.shape[0]
 # print("train mse:", train_mse)
 self.train_mse_at_epochs.append( [1, train_mse] )
 self.save_test_mse(0, validation=validation)
 for i in tqdm(range(iterations), desc="Gradient Descent progress:", leave=verbose):
 grad_P, grad_Q, error = self.gradients(_lambda)
 v_P = beta*v_P + (1-beta)*grad_P 
 v_Q = beta*v_Q + (1-beta)*grad_Q
 self.P = self.P - learning_rate*v_P # Grad Desc update rule
 self.Q = self.Q - learning_rate*v_Q # Grad Desc update rule
 # print("train mse:", error.power(2).sum()/self.R.shape[0] ) #print mse at each iteration
 if(not (i+1)%self.verbose_epoch_intvl):
 self.save_train_mse(i, error)
 self.save_test_mse(i, validation=validation)
 return self.P, self.Q
 def save_train_mse(self, i, error):
 train_mse = error.power(2).sum()/self.R.nnz #self.R.shape[0]
 self.train_mse_at_epochs.append( [i+1, train_mse])
 if(self.verbose):
 print("\niteration/epoch", i+1, ":")
 print("train mse:", train_mse)
 def save_test_mse(self, i, validation:sparse.csc_matrix=None, solver='gd'):
 if(validation != None):
 valdn_p_nnz_idx, valdn_q_nnz_idx = validation.nonzero()
 valdn_predicted = np.sum( np.multiply(self.P[valdn_p_nnz_idx], self.Q[valdn_q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
 valdn_predicted = sparse.csc_matrix( (valdn_predicted, (valdn_p_nnz_idx, valdn_q_nnz_idx)), shape=(self.m,self.n))
 valdn_error = validation - valdn_predicted
 test_mse = valdn_error.power(2).sum()/self.R.nnz #self.R.shape[0]
 self.test_mse_at_epochs.append([i+1, test_mse])
 if(self.verbose):
 print("iteration", i+1, ":")
 print("test mse:", test_mse)
 def sgd(self, epochs=10, learning_rate=0.03, _lambda=0.0, beta=0.8, verbose=False, validation:sparse.csc_matrix=None):
 """
 Returns factor matrices for explicit data matrix by learning their entries via stochastic gradient descent.
 Returned factor matrices P,Q represent user and item embeddings, for explicit ratings data R.
 R = P.Q^T
 Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
 """
 self.verbose = verbose
 beta = beta
 num_samples = len(self.p_nnz_idx)
 training_indices = np.arange(num_samples)
 # I couldn't have shuffled the array p_nnz_idx itself, cuz that would require shuffling q_nnx_idx in a corresponding manner
 # with tqdm(total=100) as prog_bar:
 for epoch in tqdm(range(epochs), desc="Total SGD epochs completion: ", leave=verbose):
 np.random.shuffle(training_indices) #stochasticity must be ensured properly!
 count = 0
 for i in tqdm(training_indices, desc=f"Epoch #{epoch} completion: "): #We need to update all users and items at least once in each iteration
 p_idx = self.p_nnz_idx[i]
 q_idx = self.q_nnz_idx[i]
 R_ij = self.R[p_idx, q_idx]
 P_i = self.P[p_idx]
 Q_j = self.Q[q_idx]
 R_predicted_ij = P_i.dot(Q_j.T)
 error_ij = R_ij - R_predicted_ij
 grad_P_i = -2*(error_ij * Q_j) + 2*(_lambda * self.P[p_idx])
 grad_Q_j = -2*(error_ij * P_i) + 2*(_lambda * self.Q[q_idx])
 v_P_i = beta*v_P_i + (1-beta)*grad_P_i if epoch!=0 else grad_P_i
 v_Q_j = beta*v_Q_j + (1-beta)*grad_Q_j if epoch!=0 else grad_Q_j
 
 self.P[p_idx] -= learning_rate * v_P_i
 self.Q[q_idx] -= learning_rate * v_Q_j
 # self.P[p_idx] -= learning_rate* grad_P_i
 # self.Q[q_idx] -= learning_rate* grad_Q_j
 if(not ((epoch+1)*(count+1)+1)%(1000)):
 R_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
 R_predicted = sparse.csc_matrix( (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
 error = self.R - R_predicted
 self.save_train_mse((epoch+1)*(count+1), error)
 self.save_test_mse((epoch+1)*(count+1), validation=validation)
 count += 1
 return self.P, self.Q
 def als(self, epochs=10, alpha=40, _lambda=10, solver="cg", validation:sparse.csc_matrix=None, verbose=False):
 """
 Returns factor matrices for implicit data matrix by learning their entries via Alternating Least Squares method.
 Returned factor matrices P,Q represent user and item embeddings, for binarised ratings matrix 'Phi' derived from R, as follows:
 Phi[p,q] = { 0 for R[p,q]=0
 { 1 for R[p,q]>0
 & Phi = P.Q^T
 The parameter `solver` specifies the solver to use. Currently choose between "cg" (iterative conjugate gradient method) or "lu" (PLU decomposition).
 
 Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
 """
 self.verbose = verbose
 lambda_eye = _lambda * sparse.eye(self.k)
 P = sparse.csc_matrix(self.P)
 Q = sparse.csc_matrix(self.Q)
 # the memory for this loop can further be optimised by eliminating the two similar loops with appropriate conditioning
 for i in tqdm(range(epochs), desc="ALS iterations completion: ", leave=verbose):
 # Q = sparse.csc_matrix(self.Q)
 QTQ = Q.T.dot(Q) #precomputation of Q.T.dot(Q) for all elements of P
 for p in tqdm(range(self.m), desc="Solving for P: ", leave=verbose):
 R_p = self.R[p].toarray()
 CpI = sparse.diags(alpha * R_p, [0]) #This acts as the (C^u - I) matrix in literature
 QTCpIQ = Q.T.dot(CpI).dot(Q)
 A = (QTQ + QTCpIQ) + lambda_eye
 phi = R_p.copy()
 phi[np.where(phi != 0)] = 1.0
 b = Q.T.dot(CpI + sparse.eye(self.n)).dot(sparse.csc_matrix(phi).T) # why the last transpose on phi though?
 if(solver == "lu"):
 P[p] = sparse.linalg.spsolve(A, b)
 # self.P[p] = sparse.linalg.spsolve(A, b)
 if(solver == "cg"):
 P[p], exitcode = sparse.linalg.cg(A, b.toarray())
 if(exitcode):
 print(f"Conjugate Gradient Method couldn't converge properly, exited with convergence exitcode {exitcode}")
 
 # P = sparse.csc_matrix(self.P)
 PTP = P.T.dot(P) #precomputation of P.T.dot(P) for all elements of Q
 for q in tqdm(range(self.n), desc="Solving for Q: ", leave=verbose):
 R_q = self.R[:,q].T.toarray()
 CqI = sparse.diags(alpha * R_q, [0]) #This acts as the (C^u - I) matrix in literature
 PTCqIP = P.T.dot(CqI).dot(P)
 A = (PTP + PTCqIP) + lambda_eye
 phi = R_q.copy()
 phi[np.where(phi != 0)] = 1.0
 b = P.T.dot(CqI + sparse.eye(self.m)).dot(sparse.csc_matrix(phi).T) 
 if(solver == "lu"):
 Q[q] = sparse.linalg.spsolve(A, b)
 # self.Q[q] = sparse.linalg.spsolve(A, b)
 if(solver == "cg"):
 Q[q], exitcode = sparse.linalg.cg(A, b.toarray())
 if(exitcode):
 print(f"Conjugate Gradient Method couldn't converge properly, exited with convergence exitcode {exitcode}")
 # Phi_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
 # Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
 Phi_predicted = np.sum( np.multiply(P.toarray()[self.p_nnz_idx], Q.toarray()[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
 Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
 Phi = self.R
 Phi[Phi.nonzero()] = 1.0
 error = Phi - Phi_predicted
 self.save_train_mse(i, error)
 valdn_p_nnz_idx, valdn_q_nnz_idx = validation.nonzero()
 valdn_predicted = np.sum( np.multiply(P.toarray()[valdn_p_nnz_idx], Q.toarray()[valdn_q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
 valdn_predicted = sparse.csc_matrix( (valdn_predicted, (valdn_p_nnz_idx, valdn_q_nnz_idx)), shape=(self.m,self.n))
 valdn_Phi = validation
 valdn_Phi[valdn_Phi.nonzero()] = 1.0
 valdn_error = valdn_Phi- valdn_predicted
 test_mse = valdn_error.power(2).sum()/self.R.nnz #self.R.shape[0]
 self.test_mse_at_epochs.append([i+1, test_mse])
 if(self.verbose):
 print("iteration", i+1, ":")
 print("test mse:", test_mse)
 self.P = P
 self.Q = Q
 return self.P, self.Q
 def plot_learning_curve(self, validation=None):
 if(validation != None):
 test_mse_arr = np.array(self.test_mse_at_epochs)
 plt.plot(test_mse_arr[:,0], test_mse_arr[:,1], label='Test')
 train_mse_arr = np.array(self.train_mse_at_epochs)
 plt.plot(train_mse_arr[:,0], train_mse_arr[:,1], label='Train')
 plt.xticks(fontsize=16)
 plt.yticks(fontsize=16)
 plt.xlabel('iterations', fontsize=12)
 plt.ylabel('MSE', fontsize=12)
 plt.legend(loc='best', fontsize=9)

Question 2

imports

I don't understand why you omitted these lines. I added them back. They alter the meaning of the submitted source code, and make it reviewable.

from scipy import sparse
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np

docstrings

The docstrings are illegible. They assume the Gentle Reader has a display with similar width and font size as your own. Not everyone's display will show 250-character lines without horizontal scrolling. Even docformatter was unable to salvage more than the initial paragraph.

In contrast, e.g. the __predict_observed docstring is very nice, thank you. Dunno what's up with the _ _ dunder prefix. Recommend you delete one underscore; name mangling probably won't be helpful here. Kudos for breaking it out as a helper.

... of dimensions (m * k) and (n * k)

That's a curious way to express it, given that we can't multiply them as-is. I would have anticipated "... and (k * n)".

In the gradients() docstring, please capitalize the L2 norm. I initially read it as """Returns 12 gradients ....

trailing comments

I thank you for formatting the code with black, I truly do.

 self.Q = np.random.rand(
 self.n, k
 ) # initialising the second factor embedding (of "items")

What black is gently trying to suggest to you is to rephrase a too-long line like this as a pair of lines:

 # initialising the second factor embedding (of "items")
 self.Q = np.random.rand(self.n, k)

I imagine that present tense works for some folks, so I'll reserve judgement on how the comment is phrased. The several comments we see here are helpful; keep it up!

Similar remarks for trailing vs two-line comments in other methods.

The trailing comments on R and k really should be __init__ ctor docstring explanations of what caller should pass in, even if that's redundant with the class docstring. Think about a caller who has not yet managed to invoke the ctor correctly. Sometimes that's due to "wrong shape". So I'm looking for an explicit mention of "m users" and "n items" in a docstring, please.

odd identifier

 self.p_nnz_idx, ... = ...

In addition to your comment, I'm reading the docs:

... indices of the non-zero elements of the array.

Reading "nn" suggests to me "non-negative". And reading "nz" suggests "non-zero". But "nnz"? I'm stumped. Conventionally it would denote "number of non-zero" elements, but that meaning doesn't fit here. Please improve the identifier or the comment.

Also, this magic number is Fine, but consider letting the caller optionally change it via a kw default arg in the signature:

 self.verbose_epoch_intvl = 50

unused method

We find gradients() in your Public API, but nothing calls it. Please write a test suite which demonstrates correct use of it. I find myself wondering what the appropriate post-conditions for it should be.

prior work

There's more than one python implementation of such factorization in the literature. It would be helpful for your test suite to put at least one such library head-to-head with your own. It's not necessary for your reinvent-the-wheel learning exercise to beat SOTA or even be "competitive". But everyone, including you, would like to be reassured the numeric results are trustworthy, that they're close to what others have obtained.

private markings in Public API

 def gradient_descent(
 ...
 _lambda=0.0002,

No, please don't do that. Spell it properly. The identifier is lambda. There can be no concept of it being _private when it is prominently part of a Public API.

Similarly in sgd() and als().

logger

 # print("train mse:", train_mse)

Consider doing import logging and creating a proper logger. Then caller can control verbosity by adjusting the logging level.

BTW, kudos on nice use of tqdm.

A file logger would not interfere with tqdm output, unlike e.g. a console logger.

This puzzles me:

 if not (i + 1) % self.verbose_epoch_intvl:

Ordinarily I would expect to see 2% of epochs recorded, rather than 48% of them. Also, consider renaming them into private _save... helpers.

unused param

 def save_test_mse(..., solver="gd"):

Recommend you elide solver.

initializing locals

 v_P_i = beta * v_P_i + (1 - beta) * grad_P_i if epoch != 0 else grad_P_i

This is trickier than necessary. Please initialize v_P_i and v_Q_j up top, before the loop begins. It makes it easier for humans to understand what's happening, as well as for automated static analyses.

 if not ((epoch + 1) * (count + 1) + 1) % (1000):

Please append that 1000 magic number to the signature, as a defaulted kw argument.

extract helper

 R_predicted = np.sum(
 np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]),
 axis=1,
 ) # calculating the products only for the nonzero "observed" rating entries
 R_predicted = sparse.csc_matrix(
 (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)),
 shape=(self.m, self.n),
 )

Don't we already have a helper for computing R_predicted? Please have sgd() call it.

Literal annotation

 def als(
 ...
 solver="cg",
 ...
 The parameter `solver` specifies the
 solver to use. Currently choose between "cg" (iterative
 conjugate gradient method) or "lu" (PLU decomposition).

Your docstring is beautiful, and I thank you for it.

Consider beefing up the type annotation, even beyond solver: str. There's an opportunity to put this in the signature:

 solver: Literal["lu", "cg"] = "cg",

Please don't have sgd and als do this:

 self.verbose = verbose

It is a very surprising side effect, since the __init__ ctor already set it. Consider removing it as a ctor arg if you really want to retain the current behavior.

Better to have sgd and als default this Optional[bool] to None, and have them default it from self.verbose if not specified. That is, use it as a local variable to control this method's behavior, without side effects on the underlying object.

BTW, kudos on the QTQ hoisting, and on the helpful CpI remark.

raise fatal exception

 P[p], exitcode = sparse.linalg.cg(A, b...)
 if exitcode:
 print( ...

No.

We don't just muddle ahead with P[p] containing some random unusable result -- we need to raise a fatal error here. Otherwise the caller will trust junk figures.

Also you should Extract Helper, as this code is used for P and for Q.

Similar remarks for sgd and als computing valdn_predicted and friends.

is not

 def plot_learning_curve(self, validation=None):
 if validation != None:

Sorry, this is just a weird python community thing, related to Singletons.

Please prefer to test if validation is not None:

Actually you could probably get away with a simple if validation:

test suite

If you make it easy for folks to run your code, they're more likely to do so. Consider including a small dataset with known factoring results, so a maintenance engineer can go through an interactive edit-test cycle in less than ten seconds following each proposed edit.

I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private?

No.

https://peps.python.org/pep-0008/#designing-for-inheritance

We don’t use the term "private" here, since no attribute is really private in Python (without a generally unnecessary amount of work).

Often this is paraphrased as "we're all grownups here". That is, unlike java and other languages, in python nothing is truly private, a caller can always get at it in some way. Instead we trust responsible software engineers to obey the rules of the road, and not go poking into _private details which are clearly not part of the supported Public API and therefore may very well silently blow up in a subsequent release. (OTOH, it is perfectly OK that you write some def _foo library routine, and then in another module you write a unit test which exercises _foo(). A test suite can certainly reach in to verify the internals are in good shape.)

If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python’s name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.

__double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.

I didn't see any test suite methods which exercised a class that inherited from your MatrixFactorisation class. Therefore I inferred that you were not trying to exploit this "name mangling" behavior. I have worked with a large amount of python code in commercial and FOSS contexts, and it is seldom the case that the Author actually (1.) understands and (2.) desires mangling.

Certainly it has use cases. When you come across such a use case, you should explain in # comments or in docstrings what the design rationale is, and you should definitely include a test suite that demonstrates how to rely on such behavior. This will prove an invaluable aid to engineers who inherit from your class in future.

_lambda was chosen to not conflict with the keyword lambda

Please prefer lambda_.

https://peps.python.org/pep-0008/#descriptive-naming-styles

single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g. class_

When you say _lambda you're claiming it is _private, but that doesn't make sense in cases like a local variable (already private!) or as an identifier visible in the Public API that you designed.

"prior work"?

IDK, matrix factorization has a huge literature, and python is one of a handful of popular implementation languages, so the Venn diagram intersection is pretty big. (Does a C++ or pyrex module designed to be called from python count? Shrug!)

A cursory search immediately reveals packages like:

Just ask yourself the question: "How would I solve this business problem if my library did not exist?"

Question 3

Really insightful review, thanks! I must clarify & follow up with a couple queries: - I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private? I didn't already know the way to do this, but the top search results suggested so, since the name mangling would render them a hassle to call. - _lambda was chosen to not conflict with the keyword lambda. Am I glossing over something in thinking so? - might you have any other remarks about type annotations in the code? - could you point to some resources for "prior work" comments?

Question 4

Consider tacking on --strict when you invoke $ mypy *.py. It will encourage you to revise how you're annotating. Essentially it moves from "optional" to "mandatory" annotations, something you may or may not wish to embrace. The downside is it's a little more work. The upside is that sometimes mypy tells me I was an idiot, that I didn't want to write that foolish line of code and I should revise it at once. So I put up with a little inconvenience, and sometimes I am glad of the payoff.

Question 5

Ah! So using "private" informally (in the sense that Python doesn't enforce it) has two interpretations here?! Using a single prefixed underscore is reminiscent of protected attributes in Cpp (I guess this must be a crude comparison in some way), whereas, using two prefixed underscores declares things more like the formal private attributes of a Cpp class. A quick search seems to suggest so. Is there a subtler ballgame at play here?

Question 6

With great thanks and credit to your kind attention, let me slide in a final question about—whether this site itself would be a good place to also expect reviews on the optimisation logic? Should I be hoping to collect future answers of that sort here itself? Or, is a specialised query for the same on some other SE site the better alternative?

Question 7

My advice is: Stick with single underscore. It works fine almost always. If you feel "mangling" is appropriate, write down the design rationale, as I outlined. // As far as the optimization logic goes, I do not yet have an opinion. But I outlined that (1.) it should be easy to run an example of interest, and (2.) we should be able to compare to SOTA or some relevant implementation. Add those to a submission here or elsewhere, and we’ll have something more concrete to explore in detail.

Question 8

It is great that you used docstrings and comments.

Here are some minor suggestions, purely for coding style.

There are many long lines of code, and many can be shortened by moving the end-of-line comments to their own line.

It looks like you are using 7-space indents, which is unusual. The PEP 8 style guide recommends 4-space.

The black program can be used to automatically reformat your code if you chose to do so.

I don't think you need lines like the following which simply set a variable to itself:

beta = beta

I think they can be removed to simplify the code.

There are many simple if statements like the following:

if(self.verbose):

The code would be cleaner without the parens:

if self.verbose:

Commented out code should be removed:

# Phi_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
# Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))

The commented-out print statements could use the verbose variable if you want to keep them:

# print("train mse:", train_mse)

Run pylint on your code for other detailed style advice.

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2024-04-13 15:44:30Z

imports

I don't understand why you omitted these lines. I added them back. They alter the meaning of the submitted source code, and make it reviewable.

from scipy import sparse
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np

docstrings

The docstrings are illegible. They assume the Gentle Reader has a display with similar width and font size as your own. Not everyone's display will show 250-character lines without horizontal scrolling. Even docformatter was unable to salvage more than the initial paragraph.

In contrast, e.g. the __predict_observed docstring is very nice, thank you. Dunno what's up with the _ _ dunder prefix. Recommend you delete one underscore; name mangling probably won't be helpful here. Kudos for breaking it out as a helper.

... of dimensions (m * k) and (n * k)

That's a curious way to express it, given that we can't multiply them as-is. I would have anticipated "... and (k * n)".

In the gradients() docstring, please capitalize the L2 norm. I initially read it as """Returns 12 gradients ....

trailing comments

I thank you for formatting the code with black, I truly do.

 self.Q = np.random.rand(
 self.n, k
 ) # initialising the second factor embedding (of "items")

What black is gently trying to suggest to you is to rephrase a too-long line like this as a pair of lines:

 # initialising the second factor embedding (of "items")
 self.Q = np.random.rand(self.n, k)

I imagine that present tense works for some folks, so I'll reserve judgement on how the comment is phrased. The several comments we see here are helpful; keep it up!

Similar remarks for trailing vs two-line comments in other methods.

The trailing comments on R and k really should be __init__ ctor docstring explanations of what caller should pass in, even if that's redundant with the class docstring. Think about a caller who has not yet managed to invoke the ctor correctly. Sometimes that's due to "wrong shape". So I'm looking for an explicit mention of "m users" and "n items" in a docstring, please.

odd identifier

 self.p_nnz_idx, ... = ...

In addition to your comment, I'm reading the docs:

... indices of the non-zero elements of the array.

Reading "nn" suggests to me "non-negative". And reading "nz" suggests "non-zero". But "nnz"? I'm stumped. Conventionally it would denote "number of non-zero" elements, but that meaning doesn't fit here. Please improve the identifier or the comment.

Also, this magic number is Fine, but consider letting the caller optionally change it via a kw default arg in the signature:

 self.verbose_epoch_intvl = 50

unused method

We find gradients() in your Public API, but nothing calls it. Please write a test suite which demonstrates correct use of it. I find myself wondering what the appropriate post-conditions for it should be.

prior work

There's more than one python implementation of such factorization in the literature. It would be helpful for your test suite to put at least one such library head-to-head with your own. It's not necessary for your reinvent-the-wheel learning exercise to beat SOTA or even be "competitive". But everyone, including you, would like to be reassured the numeric results are trustworthy, that they're close to what others have obtained.

private markings in Public API

 def gradient_descent(
 ...
 _lambda=0.0002,

No, please don't do that. Spell it properly. The identifier is lambda. There can be no concept of it being _private when it is prominently part of a Public API.

Similarly in sgd() and als().

logger

 # print("train mse:", train_mse)

Consider doing import logging and creating a proper logger. Then caller can control verbosity by adjusting the logging level.

BTW, kudos on nice use of tqdm.

A file logger would not interfere with tqdm output, unlike e.g. a console logger.

This puzzles me:

 if not (i + 1) % self.verbose_epoch_intvl:

Ordinarily I would expect to see 2% of epochs recorded, rather than 48% of them. Also, consider renaming them into private _save... helpers.

unused param

 def save_test_mse(..., solver="gd"):

Recommend you elide solver.

initializing locals

 v_P_i = beta * v_P_i + (1 - beta) * grad_P_i if epoch != 0 else grad_P_i

This is trickier than necessary. Please initialize v_P_i and v_Q_j up top, before the loop begins. It makes it easier for humans to understand what's happening, as well as for automated static analyses.

 if not ((epoch + 1) * (count + 1) + 1) % (1000):

Please append that 1000 magic number to the signature, as a defaulted kw argument.

extract helper

 R_predicted = np.sum(
 np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]),
 axis=1,
 ) # calculating the products only for the nonzero "observed" rating entries
 R_predicted = sparse.csc_matrix(
 (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)),
 shape=(self.m, self.n),
 )

Don't we already have a helper for computing R_predicted? Please have sgd() call it.

Literal annotation

 def als(
 ...
 solver="cg",
 ...
 The parameter `solver` specifies the
 solver to use. Currently choose between "cg" (iterative
 conjugate gradient method) or "lu" (PLU decomposition).

Your docstring is beautiful, and I thank you for it.

Consider beefing up the type annotation, even beyond solver: str. There's an opportunity to put this in the signature:

 solver: Literal["lu", "cg"] = "cg",

Please don't have sgd and als do this:

 self.verbose = verbose

It is a very surprising side effect, since the __init__ ctor already set it. Consider removing it as a ctor arg if you really want to retain the current behavior.

Better to have sgd and als default this Optional[bool] to None, and have them default it from self.verbose if not specified. That is, use it as a local variable to control this method's behavior, without side effects on the underlying object.

BTW, kudos on the QTQ hoisting, and on the helpful CpI remark.

raise fatal exception

 P[p], exitcode = sparse.linalg.cg(A, b...)
 if exitcode:
 print( ...

No.

We don't just muddle ahead with P[p] containing some random unusable result -- we need to raise a fatal error here. Otherwise the caller will trust junk figures.

Also you should Extract Helper, as this code is used for P and for Q.

Similar remarks for sgd and als computing valdn_predicted and friends.

is not

 def plot_learning_curve(self, validation=None):
 if validation != None:

Sorry, this is just a weird python community thing, related to Singletons.

Please prefer to test if validation is not None:

Actually you could probably get away with a simple if validation:

test suite

If you make it easy for folks to run your code, they're more likely to do so. Consider including a small dataset with known factoring results, so a maintenance engineer can go through an interactive edit-test cycle in less than ten seconds following each proposed edit.

I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private?

No.

https://peps.python.org/pep-0008/#designing-for-inheritance

We don’t use the term "private" here, since no attribute is really private in Python (without a generally unnecessary amount of work).

Often this is paraphrased as "we're all grownups here". That is, unlike java and other languages, in python nothing is truly private, a caller can always get at it in some way. Instead we trust responsible software engineers to obey the rules of the road, and not go poking into _private details which are clearly not part of the supported Public API and therefore may very well silently blow up in a subsequent release. (OTOH, it is perfectly OK that you write some def _foo library routine, and then in another module you write a unit test which exercises _foo(). A test suite can certainly reach in to verify the internals are in good shape.)

If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python’s name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.

__double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.

I didn't see any test suite methods which exercised a class that inherited from your MatrixFactorisation class. Therefore I inferred that you were not trying to exploit this "name mangling" behavior. I have worked with a large amount of python code in commercial and FOSS contexts, and it is seldom the case that the Author actually (1.) understands and (2.) desires mangling.

Certainly it has use cases. When you come across such a use case, you should explain in # comments or in docstrings what the design rationale is, and you should definitely include a test suite that demonstrates how to rely on such behavior. This will prove an invaluable aid to engineers who inherit from your class in future.

_lambda was chosen to not conflict with the keyword lambda

Please prefer lambda_.

https://peps.python.org/pep-0008/#descriptive-naming-styles

single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g. class_

When you say _lambda you're claiming it is _private, but that doesn't make sense in cases like a local variable (already private!) or as an identifier visible in the Public API that you designed.

"prior work"?

IDK, matrix factorization has a huge literature, and python is one of a handful of popular implementation languages, so the Venn diagram intersection is pretty big. (Does a C++ or pyrex module designed to be called from python count? Shrug!)

A cursory search immediately reveals packages like:

Just ask yourself the question: "How would I solve this business problem if my library did not exist?"

Really insightful review, thanks! I must clarify & follow up with a couple queries: - I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private? I didn't already know the way to do this, but the top search results suggested so, since the name mangling would render them a hassle to call. - _lambda was chosen to not conflict with the keyword lambda. Am I glossing over something in thinking so? - might you have any other remarks about type annotations in the code? - could you point to some resources for "prior work" comments?
Consider tacking on --strict when you invoke $ mypy *.py. It will encourage you to revise how you're annotating. Essentially it moves from "optional" to "mandatory" annotations, something you may or may not wish to embrace. The downside is it's a little more work. The upside is that sometimes mypy tells me I was an idiot, that I didn't want to write that foolish line of code and I should revise it at once. So I put up with a little inconvenience, and sometimes I am glad of the payoff.
Ah! So using "private" informally (in the sense that Python doesn't enforce it) has two interpretations here?! Using a single prefixed underscore is reminiscent of protected attributes in Cpp (I guess this must be a crude comparison in some way), whereas, using two prefixed underscores declares things more like the formal private attributes of a Cpp class. A quick search seems to suggest so. Is there a subtler ballgame at play here?
With great thanks and credit to your kind attention, let me slide in a final question about—whether this site itself would be a good place to also expect reviews on the optimisation logic? Should I be hoping to collect future answers of that sort here itself? Or, is a specialised query for the same on some other SE site the better alternative?
My advice is: Stick with single underscore. It works fine almost always. If you feel "mangling" is appropriate, write down the design rationale, as I outlined. // As far as the optimization logic goes, I do not yet have an opinion. But I outlined that (1.) it should be easy to run an example of interest, and (2.) we should be able to compare to SOTA or some relevant implementation. Add those to a submission here or elsewhere, and we’ll have something more concrete to explore in detail.

Stack Exchange Network

Matrix Factorisation class packaging methods for factorisation of explicit & implicit data matrices using Gradient Descent, SGD and ALS

Personal aims for the code

2 Answers 2

imports

docstrings

Table Of Contents

trailing comments

odd identifier

unused method

prior work

private markings in Public API

logger

unused param

initializing locals

extract helper

Literal annotation

raise fatal exception

is not

test suite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Matrix Factorisation class packaging methods for factorisation of explicit & implicit data matrices using Gradient Descent, SGD and ALS

Personal aims for the code

2 Answers 2

imports

docstrings

Table Of Contents

trailing comments

odd identifier

unused method

prior work

private markings in Public API

logger

unused param

initializing locals

extract helper

Literal annotation

raise fatal exception

is not

test suite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions