The ‘RcppML’ package provides high-performance machine learning algorithms using Rcpp with a focus on matrix factorization.
Install the latest development version of RcppML from github:
library(devtools)
install_github("zdebruine/RcppML")library(RcppML)
library(Matrix)
#> Warning: package 'Matrix' was built under R version 4.0.5RcppML contains extremely fast NNLS solvers. Use the nnls function to solve systems of equations subject to non-negativity constraints.
The RcppML::solve function solves the equation for where is symmetric positive definite matrix of dimensions and is a vector of length or a matrix of dimensions .
# construct a system of equations
X <-matrix(rnorm(2000),100,20)
btrue <-runif(20)
y <-X %*%btrue +rnorm(100)
a <-crossprod(X)
b <-crossprod(X, y)
# solve the system of equations
x <-RcppML::nnls(a, b)
# use only coordinate descent
x <-RcppML::nnls(a, b, fast_nnls = FALSE, cd_maxit = 1000, cd_tol = 1e-8)RcppML::solve implements a new and fastest-in-class algorithm for non-negative least squares:
cd_maxit = 0 to use only the FAST solver.Project dense linear factor models onto real-valued sparse matrices (or any matrix coercible to Matrix::dgCMatrix) using RcppML::project.
RcppML::project solves the equation for .
# simulate a sparse matrix
A <-rsparsematrix(1000, 100, 0.1)
# simulate a linear factor model
w <-matrix(runif(1000 *10), 1000, 10)
# project the model
h <-RcppML::project(A, w)RcppML::nmf finds a non-negative matrix factorization by alternating least squares (alternating projections of linear models and ).
There are several ways in which the NMF algorithm differs from other currently available methods:
The following example runs rank-10 NMF on a random 1000 x 1000 matrix that is 90% sparse:
A <-rsparsematrix(100, 100, 0.1)
model <-RcppML::nmf(A, 10, verbose = F)
w <-model$w
d <-model$d
h <-model$h
model_tolerance <-tail(model$tol, 1)Tolerance is simply a measure of the average correlation between \eqn{w_{i-1} and and and for a given iteration .
For symmetric factorizations (when ), tolerance becomes a measure of the correlation between and , and diagonalization is automatically performed to enforce symmetry:
A_sym <-as(crossprod(A), "dgCMatrix")
model <-RcppML::nmf(A_sym, 10, verbose = F)Mean squared error of a factorization can be calculated for a given model using the RcppML::mse function:
RcppML::mse(A_sym, model$w, model$d, model$h)
#> [1] 1.319477