Fast algorithm for any(M==2)

Question 1

I want to find a quick way to see if a matrix M has at least one value that is, say, 2. In R, I would use any(M==2). However, this computes first M==2 for all values in M, then use any(). any() will stop at the first time a TRUE value is found, but that still means we computed way too many M==2 conditions.

I thought one could find a more efficient way, computing M==2 only as long as it is not satisfied. I tried to write a function to do this (either column-wise check, or on each element of M, check_2), but it is so far much slower. Any idea on how to improve this?

Results of benchmark, where the value Val is rather at the end of the matrix:

|expr |mean time |
|:------------------|---------:|
|any(M == Val) | 14.13623|
|is.element(Val, M) | 17.71230|
|check(M, Val) | 18.20764|
|check_2(M, Val) | 486.65347|

Code:

x <- 1:10^6
M <- matrix(x, ncol = 10, byrow=TRUE)
Val <- 50000
check <- function(x, Val) {
 i <- 1
 cond <- FALSE
 while(!cond & i <= ncol(x)) {
 cond <- any(M[,i]==Val)
 i <- i +1
 }
 cond
}
check_2 <- function(x, Val) {
 x_c <- c(x)
 i <- 1
 cond <- FALSE
 while(!cond & i <= length(x_c)) {
 cond <- x_c[i]==Val
 i <- i +1
 }
 cond
}
check_2(x=M, Val)
check(M, Val)
library(microbenchmark)
comp <- microbenchmark(any(M == Val),
 is.element(Val, M),
 check(M, Val),
 check_2(M, Val),
 times = 20)
comp

Question 2

I wouldn't expect any performance gains for check() if your Val is found in the last column. But it makes a different, for example, with Val <- 1.

Question 3

any is a primitive, it doesn't loop in R but in C, which is much much faster.

loops in R are quite slow, that's why it's important that you use said vectorized functions if you care about speed (apply functions are still loops however).

A way to speed things up is to use package Rcpp to write code in C++ through R, when you have a slow R function that uses simple loops it's the way to go, it's still not as fast as C but in our case maybe that'll be enough given we don't need to go through all the vector ?

Let's check:

# defines anyx_cpp
cppFunction(
 'bool anyx_cpp(const NumericVector x,const double y) {
 const double n = x.size();
 for (double i = 1; i < n; i++) {
 if (x(i) == y) {
 return(true);
 }
 }
 return false;
 }')
anyx_r <- function(x,y){
 for(x_ in x) if(x_ == y) return(TRUE)
 FALSE
} 
vec <- 1:1e7
x <- 5e6
microbenchmark::microbenchmark(
 rloop = anyx_r(vec,x),
 cpp = anyx_cpp(vec,x),
 native = any(vec==x)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rloop 166.5758 171.34355 203.15277 179.9776 198.8560 990.1650 100
# cpp 39.5462 40.60585 57.84617 41.4594 46.1232 690.1746 100
# native 36.9900 37.86090 51.80317 38.9640 43.6510 888.3059 100

Almost but not quite ;).

So bottom line, in general you can trust vectorized R functions, even if it might seem they're working too much at first sight.

moodymudskipper moodymudskipper 2961 silver badge6 bronze badges · Accepted Answer · 2018-09-13 00:23:27Z

any is a primitive, it doesn't loop in R but in C, which is much much faster.

loops in R are quite slow, that's why it's important that you use said vectorized functions if you care about speed (apply functions are still loops however).

A way to speed things up is to use package Rcpp to write code in C++ through R, when you have a slow R function that uses simple loops it's the way to go, it's still not as fast as C but in our case maybe that'll be enough given we don't need to go through all the vector ?

Let's check:

# defines anyx_cpp
cppFunction(
 'bool anyx_cpp(const NumericVector x,const double y) {
 const double n = x.size();
 for (double i = 1; i < n; i++) {
 if (x(i) == y) {
 return(true);
 }
 }
 return false;
 }')
anyx_r <- function(x,y){
 for(x_ in x) if(x_ == y) return(TRUE)
 FALSE
} 
vec <- 1:1e7
x <- 5e6
microbenchmark::microbenchmark(
 rloop = anyx_r(vec,x),
 cpp = anyx_cpp(vec,x),
 native = any(vec==x)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rloop 166.5758 171.34355 203.15277 179.9776 198.8560 990.1650 100
# cpp 39.5462 40.60585 57.84617 41.4594 46.1232 690.1746 100
# native 36.9900 37.86090 51.80317 38.9640 43.6510 888.3059 100

Almost but not quite ;).

So bottom line, in general you can trust vectorized R functions, even if it might seem they're working too much at first sight.

Stack Exchange Network

Fast algorithm for any(M==2)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fast algorithm for any(M==2)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions