Attached below, and also as this GitHub gist, is code for a Python class I wrote as part of a personal learning/portfolio project on collaborative-filtering recommender systems via matrix factorisation. The full Jupyter notebook can be found here for greater context. I am looking to review my code for the class furnished.
The class docstring describes a summary of the overarching input-output goals, and I hope it shall be sufficient. The input matrix to be factorised is expected to be provided post data processing from some dataset. The datasets used in my notebook contain 100k (for development) and 10Million (aimed for) user feedback datapoints for movies, provided by the Grouplens research group.
The class implements Gradient Descent and Stochastic Gradient Descent methods for factorisation of matrices interpreted as containing explicit feedback data, and the Alternating Least Squares method with scipy.sparse.linalg.spsolve
(LU decomposition) and scipy.sparse.linalg.cg
(conjugate gradient) solvers for data interpreted as implicit feedback. It also provides functionality to plot learning curves.
Personal aims for the code
I'm a master's student in physics, looking to pivot to machine learning jobs, and my aims with the project were threefold—writing optimised code, exercising good program design, and machine learning practices.
The first of these aims was to grok the most fundamental methods to perform matrix factorisation, and optimise their implementation to my best, so as to work even for large datasets (in the millions of data points). To this end, I've used sparse matrices almost everywhere possible, along with specialised logic to reduce dense computations, as the resulting factors are eventually expected to be dense.
Secondly, I have little experience actually writing Object Oriented code, even though I've known the principles for long, even in languages like C++. The written class was hence also an exercise in trying to write code with as good a design as I could manage. The code is inspired, and influenced by many internet resources, but is at its core written from scratch in its logical consistency. I realise this is just one class, and not much in the name of OOP, but I'm lately thinking of ways to refactor this monolithic class by producing its derivations. Feedback on the same will be appreciated.
If I may be so ambitious, in future I wish to enhance the class' functionality (along with more code) to be publishable as a small, general-purpose matrix factorisation library installable through pip. For inspiration, the library implicit implements recommendations for implicit datasets, while I wish to package functionality for both explicit and implicit datasets, especially focussing on the matrix factorisation aspect. Even if not realisable due to realistic time constraints, the point is that this ambition informs my aims for learning to write good code.
To the above three ends mentioned, and to other aspects of the code I fail to judge—I'm open to critique, with great regards to the reviewers.
Although outside the scope of this site, a small comment on the quality of this code from an employability perspective would be highly appreciated.
from scipy import sparse
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
class MatrixFactorisation():
"""
A class implementing sparsity optimised methods to factorise a provided (m*n) sparse matrix into two factor matrices of dimensions (m*k) and (n*k), with 'k' specified; designed with collaborative-filtering recommender system applications in mind.
For a matrix R, we compute the decomposition matrices P, Q: R{m*n} = P{m*k}•Q^T{k*n}
The class object is initialised with a sparse input matrix, and the value of 'k'. Even if the data is dense, consider casting it into a sparse format.
Class contains methods to factorise R, considering it either as explicit or implicit data, i.e. considering only the non-zero values for
learning, or considering the entire dense version of the matrix, including zeros, respectively.
For a more informative discussion on explicit and implicit interpretations of matrix data, refer to http://yifanhu.net/PUB/cf.pdf
If matrix R represents feedback values (ratings, view counts etc.) for "items" each described by a column of R, provided by "users" each described by a row of R,
then the factors P, Q represent vector embeddings of "user" and "item" profiles in a common k-dimensional embedding space.
The class also implements a method to plot the training and validation learning curves.
Methods:
- gradient_descent
- sgd
- als
- plot_learning_curve
"""
def __init__(self, R:sparse.csc_matrix, k:int, verbose:bool=False ):
self.R = R # R is the "ratings" matrix to be factorised
self.k = k # hyperparameter k is the number of latent factors desired in the factorised embeddings
self.m, self.n = R.shape
self.P = np.random.rand(self.m, k) # initialising the first factor embedding (of "users")
self.Q = np.random.rand(self.n, k) # initialising the second factor embedding (of "items")
self.p_nnz_idx, self.q_nnz_idx = R.nonzero() # row and col indices in R matrix respectively, of non-zero elements
self.train_mse_at_epochs = []
self.test_mse_at_epochs = []
self.verbose = verbose
self.verbose_epoch_intvl = 50
def __predict_observed(self):
"""
Returns predicted values as a sparse matrix, corresponding to indices of non-zero elements in matrix R to be factorised
Avoids constructing a dense prediction matrix produced via naïve dense matrix multiplication of factors.
Instead, produces predicted values by multiplying only the requisite factor-matrix rows corresponding to non-zero entries in R.
"""
R_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries: explicit MF
R_predicted = sparse.csc_matrix( (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
return R_predicted
def gradients(self, _lambda):
"""
Returns l2 gradients w.r.t the elements of the factor matrices, along with the prediction errors vis-a-vis actual values.
"""
R_predicted = self.__predict_observed()
error = self.R - R_predicted
norm_factor = -2 / self.R.nnz #this factor is crucial! Can't be 1 (i.e. we can't leave the gradients unnormalised, else cost diverges!)
grad_P = norm_factor * (error*self.Q) + 2*(_lambda*self.P)
grad_Q = norm_factor * (error.T*self.P) + 2*(_lambda*self.Q)
return grad_P, grad_Q, error
def gradient_descent(self, iterations=2000, learning_rate = 0.3, _lambda=0.0002, beta=0.8, validation:sparse.csc_matrix=None, verbose=False):
"""
Returns factor matrices for explicit data matrix by learning their entries via full-batch gradient descent.
Returned factor matrices P,Q represent user and item embeddings, for explicit ratings data R.
R = P.Q^T
Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
"""
self.verbose = verbose
grad_P, grad_Q, error = self.gradients(_lambda)
v_P = grad_P
v_Q = grad_Q
beta = beta # momentum constant
train_mse = error.power(2).sum()/self.R.nnz #self.R.shape[0]
# print("train mse:", train_mse)
self.train_mse_at_epochs.append( [1, train_mse] )
self.save_test_mse(0, validation=validation)
for i in tqdm(range(iterations), desc="Gradient Descent progress:", leave=verbose):
grad_P, grad_Q, error = self.gradients(_lambda)
v_P = beta*v_P + (1-beta)*grad_P
v_Q = beta*v_Q + (1-beta)*grad_Q
self.P = self.P - learning_rate*v_P # Grad Desc update rule
self.Q = self.Q - learning_rate*v_Q # Grad Desc update rule
# print("train mse:", error.power(2).sum()/self.R.shape[0] ) #print mse at each iteration
if(not (i+1)%self.verbose_epoch_intvl):
self.save_train_mse(i, error)
self.save_test_mse(i, validation=validation)
return self.P, self.Q
def save_train_mse(self, i, error):
train_mse = error.power(2).sum()/self.R.nnz #self.R.shape[0]
self.train_mse_at_epochs.append( [i+1, train_mse])
if(self.verbose):
print("\niteration/epoch", i+1, ":")
print("train mse:", train_mse)
def save_test_mse(self, i, validation:sparse.csc_matrix=None, solver='gd'):
if(validation != None):
valdn_p_nnz_idx, valdn_q_nnz_idx = validation.nonzero()
valdn_predicted = np.sum( np.multiply(self.P[valdn_p_nnz_idx], self.Q[valdn_q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
valdn_predicted = sparse.csc_matrix( (valdn_predicted, (valdn_p_nnz_idx, valdn_q_nnz_idx)), shape=(self.m,self.n))
valdn_error = validation - valdn_predicted
test_mse = valdn_error.power(2).sum()/self.R.nnz #self.R.shape[0]
self.test_mse_at_epochs.append([i+1, test_mse])
if(self.verbose):
print("iteration", i+1, ":")
print("test mse:", test_mse)
def sgd(self, epochs=10, learning_rate=0.03, _lambda=0.0, beta=0.8, verbose=False, validation:sparse.csc_matrix=None):
"""
Returns factor matrices for explicit data matrix by learning their entries via stochastic gradient descent.
Returned factor matrices P,Q represent user and item embeddings, for explicit ratings data R.
R = P.Q^T
Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
"""
self.verbose = verbose
beta = beta
num_samples = len(self.p_nnz_idx)
training_indices = np.arange(num_samples)
# I couldn't have shuffled the array p_nnz_idx itself, cuz that would require shuffling q_nnx_idx in a corresponding manner
# with tqdm(total=100) as prog_bar:
for epoch in tqdm(range(epochs), desc="Total SGD epochs completion: ", leave=verbose):
np.random.shuffle(training_indices) #stochasticity must be ensured properly!
count = 0
for i in tqdm(training_indices, desc=f"Epoch #{epoch} completion: "): #We need to update all users and items at least once in each iteration
p_idx = self.p_nnz_idx[i]
q_idx = self.q_nnz_idx[i]
R_ij = self.R[p_idx, q_idx]
P_i = self.P[p_idx]
Q_j = self.Q[q_idx]
R_predicted_ij = P_i.dot(Q_j.T)
error_ij = R_ij - R_predicted_ij
grad_P_i = -2*(error_ij * Q_j) + 2*(_lambda * self.P[p_idx])
grad_Q_j = -2*(error_ij * P_i) + 2*(_lambda * self.Q[q_idx])
v_P_i = beta*v_P_i + (1-beta)*grad_P_i if epoch!=0 else grad_P_i
v_Q_j = beta*v_Q_j + (1-beta)*grad_Q_j if epoch!=0 else grad_Q_j
self.P[p_idx] -= learning_rate * v_P_i
self.Q[q_idx] -= learning_rate * v_Q_j
# self.P[p_idx] -= learning_rate* grad_P_i
# self.Q[q_idx] -= learning_rate* grad_Q_j
if(not ((epoch+1)*(count+1)+1)%(1000)):
R_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
R_predicted = sparse.csc_matrix( (R_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
error = self.R - R_predicted
self.save_train_mse((epoch+1)*(count+1), error)
self.save_test_mse((epoch+1)*(count+1), validation=validation)
count += 1
return self.P, self.Q
def als(self, epochs=10, alpha=40, _lambda=10, solver="cg", validation:sparse.csc_matrix=None, verbose=False):
"""
Returns factor matrices for implicit data matrix by learning their entries via Alternating Least Squares method.
Returned factor matrices P,Q represent user and item embeddings, for binarised ratings matrix 'Phi' derived from R, as follows:
Phi[p,q] = { 0 for R[p,q]=0
{ 1 for R[p,q]>0
& Phi = P.Q^T
The parameter `solver` specifies the solver to use. Currently choose between "cg" (iterative conjugate gradient method) or "lu" (PLU decomposition).
Optionally provide a sparse matrix with validation data, to compute the validation performance (measured by mse) over time
"""
self.verbose = verbose
lambda_eye = _lambda * sparse.eye(self.k)
P = sparse.csc_matrix(self.P)
Q = sparse.csc_matrix(self.Q)
# the memory for this loop can further be optimised by eliminating the two similar loops with appropriate conditioning
for i in tqdm(range(epochs), desc="ALS iterations completion: ", leave=verbose):
# Q = sparse.csc_matrix(self.Q)
QTQ = Q.T.dot(Q) #precomputation of Q.T.dot(Q) for all elements of P
for p in tqdm(range(self.m), desc="Solving for P: ", leave=verbose):
R_p = self.R[p].toarray()
CpI = sparse.diags(alpha * R_p, [0]) #This acts as the (C^u - I) matrix in literature
QTCpIQ = Q.T.dot(CpI).dot(Q)
A = (QTQ + QTCpIQ) + lambda_eye
phi = R_p.copy()
phi[np.where(phi != 0)] = 1.0
b = Q.T.dot(CpI + sparse.eye(self.n)).dot(sparse.csc_matrix(phi).T) # why the last transpose on phi though?
if(solver == "lu"):
P[p] = sparse.linalg.spsolve(A, b)
# self.P[p] = sparse.linalg.spsolve(A, b)
if(solver == "cg"):
P[p], exitcode = sparse.linalg.cg(A, b.toarray())
if(exitcode):
print(f"Conjugate Gradient Method couldn't converge properly, exited with convergence exitcode {exitcode}")
# P = sparse.csc_matrix(self.P)
PTP = P.T.dot(P) #precomputation of P.T.dot(P) for all elements of Q
for q in tqdm(range(self.n), desc="Solving for Q: ", leave=verbose):
R_q = self.R[:,q].T.toarray()
CqI = sparse.diags(alpha * R_q, [0]) #This acts as the (C^u - I) matrix in literature
PTCqIP = P.T.dot(CqI).dot(P)
A = (PTP + PTCqIP) + lambda_eye
phi = R_q.copy()
phi[np.where(phi != 0)] = 1.0
b = P.T.dot(CqI + sparse.eye(self.m)).dot(sparse.csc_matrix(phi).T)
if(solver == "lu"):
Q[q] = sparse.linalg.spsolve(A, b)
# self.Q[q] = sparse.linalg.spsolve(A, b)
if(solver == "cg"):
Q[q], exitcode = sparse.linalg.cg(A, b.toarray())
if(exitcode):
print(f"Conjugate Gradient Method couldn't converge properly, exited with convergence exitcode {exitcode}")
# Phi_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
# Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
Phi_predicted = np.sum( np.multiply(P.toarray()[self.p_nnz_idx], Q.toarray()[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
Phi = self.R
Phi[Phi.nonzero()] = 1.0
error = Phi - Phi_predicted
self.save_train_mse(i, error)
valdn_p_nnz_idx, valdn_q_nnz_idx = validation.nonzero()
valdn_predicted = np.sum( np.multiply(P.toarray()[valdn_p_nnz_idx], Q.toarray()[valdn_q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
valdn_predicted = sparse.csc_matrix( (valdn_predicted, (valdn_p_nnz_idx, valdn_q_nnz_idx)), shape=(self.m,self.n))
valdn_Phi = validation
valdn_Phi[valdn_Phi.nonzero()] = 1.0
valdn_error = valdn_Phi- valdn_predicted
test_mse = valdn_error.power(2).sum()/self.R.nnz #self.R.shape[0]
self.test_mse_at_epochs.append([i+1, test_mse])
if(self.verbose):
print("iteration", i+1, ":")
print("test mse:", test_mse)
self.P = P
self.Q = Q
return self.P, self.Q
def plot_learning_curve(self, validation=None):
if(validation != None):
test_mse_arr = np.array(self.test_mse_at_epochs)
plt.plot(test_mse_arr[:,0], test_mse_arr[:,1], label='Test')
train_mse_arr = np.array(self.train_mse_at_epochs)
plt.plot(train_mse_arr[:,0], train_mse_arr[:,1], label='Train')
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xlabel('iterations', fontsize=12)
plt.ylabel('MSE', fontsize=12)
plt.legend(loc='best', fontsize=9)
2 Answers 2
imports
I don't understand why you omitted these lines. I added them back. They alter the meaning of the submitted source code, and make it reviewable.
from scipy import sparse
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
docstrings
The docstrings are illegible. They assume the Gentle Reader has a display with similar width and font size as your own. Not everyone's display will show 250-character lines without horizontal scrolling. Even docformatter was unable to salvage more than the initial paragraph.
In contrast, e.g. the __predict_observed
docstring
is very nice, thank you.
Dunno what's up with the _
_
dunder prefix.
Recommend you delete one underscore;
name mangling probably won't be helpful here.
Kudos for breaking it out as a helper.
... of dimensions (m * k) and (n * k)
That's a curious way to express it, given that we can't multiply them as-is. I would have anticipated "... and (k * n)".
In the gradients()
docstring,
please capitalize the L2 norm.
I initially read it as """Returns 12 gradients ....
Table Of Contents
Methods:
- gradient_descent
- sgd
- als
- plot_learning_curve
You're offering guideposts so folks can learn your Public API;
I appreciate the sentiment.
The def
s already say something similar.
The trouble with duplicate information is it tends
to get out of sync.
If def foo
is added, must we remember to update this list, too?
Maybe the existing gradients()
helper should be
a private _gradients()
?
Maybe the pair of "save...mse" methods should
be in an accompanying class or module?
trailing comments
I thank you for formatting the code with black, I truly do.
self.Q = np.random.rand(
self.n, k
) # initialising the second factor embedding (of "items")
What black is gently trying to suggest to you is to rephrase a too-long line like this as a pair of lines:
# initialising the second factor embedding (of "items")
self.Q = np.random.rand(self.n, k)
I imagine that present tense works for some folks, so I'll reserve judgement on how the comment is phrased. The several comments we see here are helpful; keep it up!
Similar remarks for trailing vs two-line comments in other methods.
The trailing comments on R
and k
really should be
__init__
ctor docstring explanations of what caller
should pass in, even if that's redundant with the class docstring.
Think about a caller who has not yet managed to invoke
the ctor correctly. Sometimes that's due to "wrong shape".
So I'm looking for an explicit mention of "m users"
and "n items" in a docstring, please.
odd identifier
self.p_nnz_idx, ... = ...
In addition to your comment, I'm reading the docs:
... indices of the non-zero elements of the array.
Reading "nn" suggests to me "non-negative". And reading "nz" suggests "non-zero". But "nnz"? I'm stumped. Conventionally it would denote "number of non-zero" elements, but that meaning doesn't fit here. Please improve the identifier or the comment.
Also, this magic number is Fine, but consider letting the caller optionally change it via a kw default arg in the signature:
self.verbose_epoch_intvl = 50
unused method
We find gradients()
in your Public API, but nothing calls it.
Please write a
test suite
which demonstrates correct use of it.
I find myself wondering what the appropriate
post-conditions
for it should be.
prior work
There's more than one python implementation of such factorization in the literature. It would be helpful for your test suite to put at least one such library head-to-head with your own. It's not necessary for your reinvent-the-wheel learning exercise to beat SOTA or even be "competitive". But everyone, including you, would like to be reassured the numeric results are trustworthy, that they're close to what others have obtained.
private markings in Public API
def gradient_descent(
...
_lambda=0.0002,
No, please don't do that.
Spell it properly.
The identifier is lambda
.
There can be no concept of it being _private
when it is prominently part of a Public API.
Similarly in sgd()
and als()
.
logger
# print("train mse:", train_mse)
Consider doing import logging
and creating a proper logger.
Then caller can control verbosity by adjusting the logging level.
BTW, kudos on nice use of tqdm
.
A file logger would not interfere with tqdm output, unlike e.g. a console logger.
This puzzles me:
if not (i + 1) % self.verbose_epoch_intvl:
Ordinarily I would expect to see 2% of epochs recorded,
rather than 48% of them.
Also, consider renaming them into private _save...
helpers.
unused param
def save_test_mse(..., solver="gd"):
Recommend you elide solver
.
initializing locals
v_P_i = beta * v_P_i + (1 - beta) * grad_P_i if epoch != 0 else grad_P_i
This is trickier than necessary.
Please initialize v_P_i
and v_Q_j
up top,
before the loop begins.
It makes it easier for humans to understand what's happening,
as well as for automated static analyses.
if not ((epoch + 1) * (count + 1) + 1) % (1000):
Please append that 1000
magic number to the signature,
as a defaulted kw argument.
extract helper
R_predicted = np.sum(
np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]),
axis=1,
) # calculating the products only for the nonzero "observed" rating entries
R_predicted = sparse.csc_matrix(
(R_predicted, (self.p_nnz_idx, self.q_nnz_idx)),
shape=(self.m, self.n),
)
Don't we already have a helper for computing R_predicted
?
Please have sgd()
call it.
Literal annotation
def als(
...
solver="cg",
...
The parameter `solver` specifies the
solver to use. Currently choose between "cg" (iterative
conjugate gradient method) or "lu" (PLU decomposition).
Your docstring is beautiful, and I thank you for it.
Consider beefing up the type annotation, even beyond solver: str
.
There's an opportunity to put this in the signature:
solver: Literal["lu", "cg"] = "cg",
Please don't have sgd
and als
do this:
self.verbose = verbose
It is a very surprising
side effect, since the __init__
ctor already set it.
Consider removing it as a ctor arg if you really want to retain
the current behavior.
Better to have sgd
and als
default this Optional[bool] to None,
and have them default it from self.verbose
if not specified.
That is, use it as a local variable to control this method's behavior,
without side effects on the underlying object.
BTW, kudos on the QTQ
hoisting,
and on the helpful CpI
remark.
raise fatal exception
P[p], exitcode = sparse.linalg.cg(A, b...)
if exitcode:
print( ...
No.
We don't just muddle ahead with P[p]
containing
some random unusable result --
we need to raise
a fatal error here.
Otherwise the caller will trust junk figures.
Also you should Extract Helper, as this code is used for P
and for Q
.
Similar remarks for sgd
and als
computing
valdn_predicted
and friends.
is not
def plot_learning_curve(self, validation=None):
if validation != None:
Sorry, this is just a weird python community thing, related to Singletons.
Please prefer to test if validation is not None:
Actually you could probably
get away with
a simple if validation:
test suite
If you make it easy for folks to run your code, they're more likely to do so. Consider including a small dataset with known factoring results, so a maintenance engineer can go through an interactive edit-test cycle in less than ten seconds following each proposed edit.
I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private?
No.
https://peps.python.org/pep-0008/#designing-for-inheritance
We don’t use the term "private" here, since no attribute is really private in Python (without a generally unnecessary amount of work).
Often this is paraphrased as "we're all grownups here".
That is, unlike java and other languages, in python
nothing is truly private, a caller can always get
at it in some way.
Instead we trust responsible software engineers
to obey the rules of the road, and not go poking
into _private
details which are clearly not
part of the supported Public API and therefore
may very well silently blow up in a subsequent release.
(OTOH, it is perfectly OK that you write
some def _foo
library routine, and then in
another module you write a unit test which
exercises _foo()
. A test suite can certainly
reach in to verify the internals are in good shape.)
If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python’s name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.
__double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.
I didn't see any test suite methods which exercised
a class that inherited from your MatrixFactorisation
class.
Therefore I inferred that you were not trying to
exploit this "name mangling" behavior.
I have worked with a large amount of python code
in commercial and FOSS contexts, and it is seldom
the case that the Author actually (1.) understands
and (2.) desires mangling.
Certainly it has use cases.
When you come across such a use case,
you should explain in #
comments or in docstrings
what the design rationale is,
and you should definitely include a test suite
that demonstrates how to rely on such behavior.
This will prove an invaluable aid to engineers
who inherit from your class in future.
_lambda
was chosen to not conflict with the keywordlambda
Please prefer lambda_
.
https://peps.python.org/pep-0008/#descriptive-naming-styles
single_trailing_underscore_
: used by convention to avoid conflicts with Python keyword, e.g.class_
When you say _lambda
you're claiming it is _private
,
but that doesn't make sense in cases like
a local variable (already private!) or as an identifier visible
in the Public API that you designed.
"prior work"?
IDK, matrix factorization has a huge literature, and python is one of a handful of popular implementation languages, so the Venn diagram intersection is pretty big. (Does a C++ or pyrex module designed to be called from python count? Shrug!)
A cursory search immediately reveals packages like:
- https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
- https://pypi.org/project/matrix-factorization
Just ask yourself the question: "How would I solve this business problem if my library did not exist?"
-
\$\begingroup\$ Really insightful review, thanks! I must clarify & follow up with a couple queries: - I thought declaring a method name with two prefixed underscores was indeed the way to declare them as (pseudo) private? I didn't already know the way to do this, but the top search results suggested so, since the name mangling would render them a hassle to call. -
_lambda
was chosen to not conflict with the keywordlambda
. Am I glossing over something in thinking so? - might you have any other remarks about type annotations in the code? - could you point to some resources for "prior work" comments? \$\endgroup\$Cosmic Fan– Cosmic Fan2024年04月14日 00:01:18 +00:00Commented Apr 14, 2024 at 0:01 -
1\$\begingroup\$ Consider tacking on
--strict
when you invoke$ mypy *.py
. It will encourage you to revise how you're annotating. Essentially it moves from "optional" to "mandatory" annotations, something you may or may not wish to embrace. The downside is it's a little more work. The upside is that sometimes mypy tells me I was an idiot, that I didn't want to write that foolish line of code and I should revise it at once. So I put up with a little inconvenience, and sometimes I am glad of the payoff. \$\endgroup\$J_H– J_H2024年04月14日 00:42:38 +00:00Commented Apr 14, 2024 at 0:42 -
\$\begingroup\$ Ah! So using "private" informally (in the sense that Python doesn't enforce it) has two interpretations here?! Using a single prefixed underscore is reminiscent of protected attributes in Cpp (I guess this must be a crude comparison in some way), whereas, using two prefixed underscores declares things more like the formal
private
attributes of a Cpp class. A quick search seems to suggest so. Is there a subtler ballgame at play here? \$\endgroup\$Cosmic Fan– Cosmic Fan2024年04月14日 01:25:07 +00:00Commented Apr 14, 2024 at 1:25 -
\$\begingroup\$ With great thanks and credit to your kind attention, let me slide in a final question about—whether this site itself would be a good place to also expect reviews on the optimisation logic? Should I be hoping to collect future answers of that sort here itself? Or, is a specialised query for the same on some other SE site the better alternative? \$\endgroup\$Cosmic Fan– Cosmic Fan2024年04月14日 01:37:12 +00:00Commented Apr 14, 2024 at 1:37
-
1\$\begingroup\$ My advice is: Stick with single underscore. It works fine almost always. If you feel "mangling" is appropriate, write down the design rationale, as I outlined. // As far as the optimization logic goes, I do not yet have an opinion. But I outlined that (1.) it should be easy to run an example of interest, and (2.) we should be able to compare to SOTA or some relevant implementation. Add those to a submission here or elsewhere, and we’ll have something more concrete to explore in detail. \$\endgroup\$J_H– J_H2024年04月14日 03:00:34 +00:00Commented Apr 14, 2024 at 3:00
It is great that you used docstrings and comments.
Here are some minor suggestions, purely for coding style.
There are many long lines of code, and many can be shortened by moving the end-of-line comments to their own line.
It looks like you are using 7-space indents, which is unusual. The PEP 8 style guide recommends 4-space.
The black program can be used to automatically reformat your code if you chose to do so.
I don't think you need lines like the following which simply set a variable to itself:
beta = beta
I think they can be removed to simplify the code.
There are many simple if
statements like the following:
if(self.verbose):
The code would be cleaner without the parens:
if self.verbose:
Commented out code should be removed:
# Phi_predicted = np.sum( np.multiply(self.P[self.p_nnz_idx], self.Q[self.q_nnz_idx]), axis=1 ) #calculating the products only for the nonzero "observed" rating entries
# Phi_predicted = sparse.csc_matrix( (Phi_predicted, (self.p_nnz_idx, self.q_nnz_idx)), shape=(self.m,self.n))
The commented-out print
statements could use the verbose
variable if you want to keep them:
# print("train mse:", train_mse)
Run pylint on your code for other detailed style advice.
Explore related questions
See similar questions with these tags.