Increase performance in an R function using a vectorized approach

Question 1

Goal

Write an R function to generate probabilities modeled by the following equation (IRT; 2 Parameter Logistic Model): enter image description here

Data

set.seed(1)
a = runif(10, 1.5, 3)
b = rnorm(10, 0, 1)
theta = rnorm(10000000)
# the output of the implementation should result in 
# a matrix with 100 mil. rows and 10 columns

Implementation

Strategy A

# the implementation of the equation
computeProbability = function(theta, b, a) {
 x = exp(1) ^ (a * (theta - b))
 return(x / (1 + x))
}
strategy.A = function(theta, b, a) {
 n.rows = length(theta)
 n.cols = length(b)
 prob_mtx = matrix(nrow = n.rows, ncol = n.cols)
 for(i in 1:n.rows)
 {
 prob_mtx[i, ] = computeProbability(theta[i], b, a)
 }
 return(prob_mtx)
}

Strategy B

strategy.B = function(theta, b, a) {
 return(t(sapply(theta, computeProbability, b = b, a = a)))
}

Strategy C

strategy.C = function(theta, b, a) {
 return(1 / (1 + exp(-sweep(outer(theta, b, "-"), 2, a, "*"))))
}

Timings

 # Strategy A | # Strategy B | # Strategy C
 | |
 user system elapsed | user system elapsed | user system elapsed
64.76 0.27 65.08 | 82.01 0.91 82.93 | 7.81 0.64 8.46

Question: Strategy C is by far the most efficient way, but how can I make it even faster?

Question 2

That specific question isn't on-topic on this site, but we can review the code as it is instead.

Question 3

Oh, my apologies. I thought it was too vague for StackOverflow.

Question 4

Here is a great article that explains what is meant by vectorization in R.

Question 5

@Samuel It is really helpful. Thank you very much!

Question 6

Your strategy C is probably already nearly optimal. Here is an Rcpp based solution to squeeze out some additional seconds

library('Rcpp')
cppFunction(includes = '#include <math.h>',
 code = 'Rcpp::NumericMatrix strategy_rcpp(const Rcpp::NumericVector& theta,
 const Rcpp::NumericVector& b,
 const Rcpp::NumericVector& a) {
 const int n = theta.size();
 const int m = a.size();
 if (m != b.size())
 Rcpp::stop("a and b must have equal length");
 const double e = exp(1.0);
 Rcpp::NumericMatrix ret(n, m);
 for (int j = 0; j < m; ++j) {
 for (int i = 0; i < n; ++i) {
 ret(i, j) = 1.0 / (1.0 + pow(e, -a[j] * (theta[i] - b[j])));
 }
 }
 return ret;
}')

Benchmarks with your data:

library('microbenchmark')
microbenchmark(times = 1,
 strategy.C(theta, b, a),
 strategy_rcpp(theta, b, a)
)
Unit: seconds 
 expr min lq mean median uq
 strategy.C(theta, b, a) 10.996822 10.996822 10.996822 10.996822 10.996822
 strategy_rcpp(theta, b, a) 6.921705 6.921705 6.921705 6.921705 6.921705

NoBackingDown NoBackingDown 1662 bronze badges · Answer 1 · 2017-04-22 11:03:55Z

Your strategy C is probably already nearly optimal. Here is an Rcpp based solution to squeeze out some additional seconds

library('Rcpp')
cppFunction(includes = '#include <math.h>',
 code = 'Rcpp::NumericMatrix strategy_rcpp(const Rcpp::NumericVector& theta,
 const Rcpp::NumericVector& b,
 const Rcpp::NumericVector& a) {
 const int n = theta.size();
 const int m = a.size();
 if (m != b.size())
 Rcpp::stop("a and b must have equal length");
 const double e = exp(1.0);
 Rcpp::NumericMatrix ret(n, m);
 for (int j = 0; j < m; ++j) {
 for (int i = 0; i < n; ++i) {
 ret(i, j) = 1.0 / (1.0 + pow(e, -a[j] * (theta[i] - b[j])));
 }
 }
 return ret;
}')

Benchmarks with your data:

library('microbenchmark')
microbenchmark(times = 1,
 strategy.C(theta, b, a),
 strategy_rcpp(theta, b, a)
)
Unit: seconds 
 expr min lq mean median uq
 strategy.C(theta, b, a) 10.996822 10.996822 10.996822 10.996822 10.996822
 strategy_rcpp(theta, b, a) 6.921705 6.921705 6.921705 6.921705 6.921705

Stack Exchange Network

Increase performance in an R function using a vectorized approach

Goal

Data

Implementation

Timings

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Increase performance in an R function using a vectorized approach

Goal

Data

Implementation

Timings

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions