3
\$\begingroup\$

I want to find a quick way to see if a matrix M has at least one value that is, say, 2. In R, I would use any(M==2). However, this computes first M==2 for all values in M, then use any(). any() will stop at the first time a TRUE value is found, but that still means we computed way too many M==2 conditions.

I thought one could find a more efficient way, computing M==2 only as long as it is not satisfied. I tried to write a function to do this (either column-wise check, or on each element of M, check_2), but it is so far much slower. Any idea on how to improve this?

Results of benchmark, where the value Val is rather at the end of the matrix:

|expr |mean time |
|:------------------|---------:|
|any(M == Val) | 14.13623|
|is.element(Val, M) | 17.71230|
|check(M, Val) | 18.20764|
|check_2(M, Val) | 486.65347|

Code:

x <- 1:10^6
M <- matrix(x, ncol = 10, byrow=TRUE)
Val <- 50000
check <- function(x, Val) {
 i <- 1
 cond <- FALSE
 while(!cond & i <= ncol(x)) {
 cond <- any(M[,i]==Val)
 i <- i +1
 }
 cond
}
check_2 <- function(x, Val) {
 x_c <- c(x)
 i <- 1
 cond <- FALSE
 while(!cond & i <= length(x_c)) {
 cond <- x_c[i]==Val
 i <- i +1
 }
 cond
}
check_2(x=M, Val)
check(M, Val)
library(microbenchmark)
comp <- microbenchmark(any(M == Val),
 is.element(Val, M),
 check(M, Val),
 check_2(M, Val),
 times = 20)
comp
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Sep 12, 2018 at 1:50
\$\endgroup\$
1
  • \$\begingroup\$ I wouldn't expect any performance gains for check() if your Val is found in the last column. But it makes a different, for example, with Val <- 1. \$\endgroup\$ Commented Sep 12, 2018 at 8:48

1 Answer 1

3
\$\begingroup\$

any is a primitive, it doesn't loop in R but in C, which is much much faster.

loops in R are quite slow, that's why it's important that you use said vectorized functions if you care about speed (apply functions are still loops however).

A way to speed things up is to use package Rcpp to write code in C++ through R, when you have a slow R function that uses simple loops it's the way to go, it's still not as fast as C but in our case maybe that'll be enough given we don't need to go through all the vector ?

Let's check:

# defines anyx_cpp
cppFunction(
 'bool anyx_cpp(const NumericVector x,const double y) {
 const double n = x.size();
 for (double i = 1; i < n; i++) {
 if (x(i) == y) {
 return(true);
 }
 }
 return false;
 }')
anyx_r <- function(x,y){
 for(x_ in x) if(x_ == y) return(TRUE)
 FALSE
} 
vec <- 1:1e7
x <- 5e6
microbenchmark::microbenchmark(
 rloop = anyx_r(vec,x),
 cpp = anyx_cpp(vec,x),
 native = any(vec==x)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rloop 166.5758 171.34355 203.15277 179.9776 198.8560 990.1650 100
# cpp 39.5462 40.60585 57.84617 41.4594 46.1232 690.1746 100
# native 36.9900 37.86090 51.80317 38.9640 43.6510 888.3059 100

Almost but not quite ;).

So bottom line, in general you can trust vectorized R functions, even if it might seem they're working too much at first sight.

answered Sep 13, 2018 at 0:23
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.