Goal
Write an R
function to generate probabilities modeled by the following equation (IRT; 2 Parameter Logistic Model):
enter image description here
Data
set.seed(1)
a = runif(10, 1.5, 3)
b = rnorm(10, 0, 1)
theta = rnorm(10000000)
# the output of the implementation should result in
# a matrix with 100 mil. rows and 10 columns
Implementation
Strategy A
# the implementation of the equation
computeProbability = function(theta, b, a) {
x = exp(1) ^ (a * (theta - b))
return(x / (1 + x))
}
strategy.A = function(theta, b, a) {
n.rows = length(theta)
n.cols = length(b)
prob_mtx = matrix(nrow = n.rows, ncol = n.cols)
for(i in 1:n.rows)
{
prob_mtx[i, ] = computeProbability(theta[i], b, a)
}
return(prob_mtx)
}
Strategy B
strategy.B = function(theta, b, a) {
return(t(sapply(theta, computeProbability, b = b, a = a)))
}
Strategy C
strategy.C = function(theta, b, a) {
return(1 / (1 + exp(-sweep(outer(theta, b, "-"), 2, a, "*"))))
}
Timings
# Strategy A | # Strategy B | # Strategy C
| |
user system elapsed | user system elapsed | user system elapsed
64.76 0.27 65.08 | 82.01 0.91 82.93 | 7.81 0.64 8.46
Question: Strategy C is by far the most efficient way, but how can I make it even faster?
-
1\$\begingroup\$ That specific question isn't on-topic on this site, but we can review the code as it is instead. \$\endgroup\$Jamal– Jamal2017年02月21日 20:27:18 +00:00Commented Feb 21, 2017 at 20:27
-
\$\begingroup\$ Oh, my apologies. I thought it was too vague for StackOverflow. \$\endgroup\$Mihai– Mihai2017年02月21日 20:28:37 +00:00Commented Feb 21, 2017 at 20:28
-
\$\begingroup\$ Here is a great article that explains what is meant by vectorization in R. \$\endgroup\$Samuel– Samuel2017年02月27日 16:31:14 +00:00Commented Feb 27, 2017 at 16:31
-
\$\begingroup\$ @Samuel It is really helpful. Thank you very much! \$\endgroup\$Mihai– Mihai2017年02月28日 15:36:06 +00:00Commented Feb 28, 2017 at 15:36
1 Answer 1
Your strategy C is probably already nearly optimal. Here is an Rcpp based solution to squeeze out some additional seconds
library('Rcpp')
cppFunction(includes = '#include <math.h>',
code = 'Rcpp::NumericMatrix strategy_rcpp(const Rcpp::NumericVector& theta,
const Rcpp::NumericVector& b,
const Rcpp::NumericVector& a) {
const int n = theta.size();
const int m = a.size();
if (m != b.size())
Rcpp::stop("a and b must have equal length");
const double e = exp(1.0);
Rcpp::NumericMatrix ret(n, m);
for (int j = 0; j < m; ++j) {
for (int i = 0; i < n; ++i) {
ret(i, j) = 1.0 / (1.0 + pow(e, -a[j] * (theta[i] - b[j])));
}
}
return ret;
}')
Benchmarks with your data:
library('microbenchmark')
microbenchmark(times = 1,
strategy.C(theta, b, a),
strategy_rcpp(theta, b, a)
)
Unit: seconds
expr min lq mean median uq
strategy.C(theta, b, a) 10.996822 10.996822 10.996822 10.996822 10.996822
strategy_rcpp(theta, b, a) 6.921705 6.921705 6.921705 6.921705 6.921705
Explore related questions
See similar questions with these tags.