1
\$\begingroup\$

Goal

Write an R function to generate probabilities modeled by the following equation (IRT; 2 Parameter Logistic Model): enter image description here

Data

set.seed(1)
a = runif(10, 1.5, 3)
b = rnorm(10, 0, 1)
theta = rnorm(10000000)
# the output of the implementation should result in 
# a matrix with 100 mil. rows and 10 columns

Implementation

Strategy A

# the implementation of the equation
computeProbability = function(theta, b, a) {
 x = exp(1) ^ (a * (theta - b))
 return(x / (1 + x))
}
strategy.A = function(theta, b, a) {
 n.rows = length(theta)
 n.cols = length(b)
 prob_mtx = matrix(nrow = n.rows, ncol = n.cols)
 for(i in 1:n.rows)
 {
 prob_mtx[i, ] = computeProbability(theta[i], b, a)
 }
 return(prob_mtx)
}

Strategy B

strategy.B = function(theta, b, a) {
 return(t(sapply(theta, computeProbability, b = b, a = a)))
}

Strategy C

strategy.C = function(theta, b, a) {
 return(1 / (1 + exp(-sweep(outer(theta, b, "-"), 2, a, "*"))))
}

Timings

 # Strategy A | # Strategy B | # Strategy C
 | |
 user system elapsed | user system elapsed | user system elapsed
64.76 0.27 65.08 | 82.01 0.91 82.93 | 7.81 0.64 8.46

Question: Strategy C is by far the most efficient way, but how can I make it even faster?

asked Feb 21, 2017 at 20:26
\$\endgroup\$
4
  • 1
    \$\begingroup\$ That specific question isn't on-topic on this site, but we can review the code as it is instead. \$\endgroup\$ Commented Feb 21, 2017 at 20:27
  • \$\begingroup\$ Oh, my apologies. I thought it was too vague for StackOverflow. \$\endgroup\$ Commented Feb 21, 2017 at 20:28
  • \$\begingroup\$ Here is a great article that explains what is meant by vectorization in R. \$\endgroup\$ Commented Feb 27, 2017 at 16:31
  • \$\begingroup\$ @Samuel It is really helpful. Thank you very much! \$\endgroup\$ Commented Feb 28, 2017 at 15:36

1 Answer 1

1
\$\begingroup\$

Your strategy C is probably already nearly optimal. Here is an Rcpp based solution to squeeze out some additional seconds

library('Rcpp')
cppFunction(includes = '#include <math.h>',
 code = 'Rcpp::NumericMatrix strategy_rcpp(const Rcpp::NumericVector& theta,
 const Rcpp::NumericVector& b,
 const Rcpp::NumericVector& a) {
 const int n = theta.size();
 const int m = a.size();
 if (m != b.size())
 Rcpp::stop("a and b must have equal length");
 const double e = exp(1.0);
 Rcpp::NumericMatrix ret(n, m);
 for (int j = 0; j < m; ++j) {
 for (int i = 0; i < n; ++i) {
 ret(i, j) = 1.0 / (1.0 + pow(e, -a[j] * (theta[i] - b[j])));
 }
 }
 return ret;
}')

Benchmarks with your data:

library('microbenchmark')
microbenchmark(times = 1,
 strategy.C(theta, b, a),
 strategy_rcpp(theta, b, a)
)
Unit: seconds 
 expr min lq mean median uq
 strategy.C(theta, b, a) 10.996822 10.996822 10.996822 10.996822 10.996822
 strategy_rcpp(theta, b, a) 6.921705 6.921705 6.921705 6.921705 6.921705
answered Apr 22, 2017 at 11:03
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.