Optimizing the function in R by replacing for loop

Question 1

I am trying to create a function that I want to use in my real data. Is there any way to optimize the following function? This is a function I created to filter the data.

filter <- function(x, m, delt) {
 series <- x
 series_filtered <- rep(0,length(series))
 delta_t <- delt # time step between data points (hours)
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delta_t/tau_crit # critical frequency
 M <- m # number of time steps before and after available in filtering window
 h <- rep(0,(2*M+1)) # initialize the weights
 # get the lanczos coefficients
 for (n in (-1*M):M) { 
 h[M+n+1] <- ifelse(n == 0,theta_crit*delta_t/pi, 
 (sin(n*theta_crit*delta_t)/(pi*n)) * (sin(n*pi/M)/(n*pi/M)))
 }
 # need to adjust the weights so that they add up to 1
 h_unbiased <- h/sum(h)
 h <- h_unbiased
 # step through each time for which we'd like to create a filtered value
 for (t in (1+M):(length(series)-M)) {
 for (n in (-1*M):M) {
 # apply the cofficients to create the filtered series
 series_filtered[t] = series_filtered[t] + series[t+n] * h[M+n+1]
 }
 }
 # Filtered data 
 y2 <- series_filtered[(M+1):(length(series)-M)]
 return(y2)
}

Any suggestions on removing for loop and replacing by apply or any other function is appreciated.

In the above function, I have one single for loop and one double for loop.

Edit:

A testing can be done by the following data:

x <- 1:100
series <- sin(2*x)+sin(x/4) + sin(x/8) + rnorm(100)
jd2 <- filter(x=series,m=4, delt=(40/60))

Question 2

This isn't so much a question, as a code writing request. This is not trivial to interpret without any real information on the purpose or action of the function.

Question 3

It's a long-standing urban legend that *apply are faster than for. In general there's no speed gain, just clarity of code (maybe :-) ). However, anywhere you can vectorize rather than loop you'll save time, and anywhere you can pre-allocate (as you did with series_filtered) that'll help too.

Question 4

@CarlWitthoft I primarily like the the apply family for their brevity of code. And I find them easier to read, but easyness is in the eye of the beholder.

Question 5

have you looked into rcpp?

Question 6

I have tried this, but I don't know, how much (and even if) faster it is. Anyway, you used a lot of redundant code (creating variables M, h_unbiased, delta_t etc.).

filter2 <- function(x, m, delt) {
 series <- x
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical frequency
 # get the lanczos coefficients
 h <- sapply(-m:m, FUN=function(x)ifelse(x == 0,theta_crit*delta_t/pi, 
 (sin(x*theta_crit*delt)/(pi*x)) * (sin(x*pi/m)/(x*pi/m))))
 # need to adjust the weights so that they add up to 1
 h <- h/sum(h)
 # step through each time for which we'd like to create a filtered value
 y2 <- sapply((1+m):(length(series)-m), function(t){
 sum(unlist(sapply(-m:m, FUN=function(x)(series[t+x]*h[m+x+1]))))
 })
 # return
 return(y2)
}

Question 7

+1 Thank you so much for your time and effort to try to update the function. Results are working fine and are consistent but it seems it is little slower than the code I wrote before.

Question 8

This works a little faster, taking advantage of vectorized code in a couple places.

filter3 <- function(x, m, delt) {
 xlen <- length(x)
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical h
 n <- seq(-m, m)
 h <- (sin(n*theta_crit*delt)/(pi*n)) * (sin(n*pi/m)/(n*pi/m))
 h[n == 0] <- theta_crit * delt/pi
 h <- h/sum(h)
 sapply((1+m):(xlen-m), function(t) sum(series[t+n] * h[m+n+1]))
}

(There were some variables defined but not used, and many defined for the sake of changing the variable name itself. Though not a huge savings with these input data, I opted to leave them out. If the input data is considerably larger, the memory footprint might have more of an effect.)

When comparing performance of this with the original, it appears to operate in almost 20% of the time (I'm looking at the median for that stat):

require(microbenchmark)
microbenchmark(f1=filter(x=series, m=4, delt=(40/60)), f3=filter3(x=series, m=4, delt=(40/60)), times=1000)
## Unit: microseconds
## expr min lq median uq max neval
## f1 2121.381 2180.634 2227.229 2527.8990 17368.383 1000
## f3 406.152 432.569 453.482 495.4905 2235.118 1000

HOWEVER, for some reason they are a little different:

f1 <- filter(x=series, m=4, delt=(40/60))
f3 <- filter3(x=series, m=4, delt=(40/60))
identical(f1, f3)
## [1] FALSE
max(f1-f3)
## [1] 4.440892e-16

... but the difference is very small. (I tried it in both 32- and 64-bit mode with the same results.) Considering the lower magnitude your sample data creates is around 0.0268, this 4.44e-16 seems a bit inconsequential.

ZbynekZbynek · Answer 1 · 2014-02-12 07:17:58Z

I have tried this, but I don't know, how much (and even if) faster it is. Anyway, you used a lot of redundant code (creating variables M, h_unbiased, delta_t etc.).

filter2 <- function(x, m, delt) {
 series <- x
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical frequency
 # get the lanczos coefficients
 h <- sapply(-m:m, FUN=function(x)ifelse(x == 0,theta_crit*delta_t/pi, 
 (sin(x*theta_crit*delt)/(pi*x)) * (sin(x*pi/m)/(x*pi/m))))
 # need to adjust the weights so that they add up to 1
 h <- h/sum(h)
 # step through each time for which we'd like to create a filtered value
 y2 <- sapply((1+m):(length(series)-m), function(t){
 sum(unlist(sapply(-m:m, FUN=function(x)(series[t+x]*h[m+x+1]))))
 })
 # return
 return(y2)
}

+1 Thank you so much for your time and effort to try to update the function. Results are working fine and are consistent but it seems it is little slower than the code I wrote before.

r2evans r2evans 2631 silver badge10 bronze badges · Answer 2 · 2014-05-12 01:28:41Z

This works a little faster, taking advantage of vectorized code in a couple places.

filter3 <- function(x, m, delt) {
 xlen <- length(x)
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical h
 n <- seq(-m, m)
 h <- (sin(n*theta_crit*delt)/(pi*n)) * (sin(n*pi/m)/(n*pi/m))
 h[n == 0] <- theta_crit * delt/pi
 h <- h/sum(h)
 sapply((1+m):(xlen-m), function(t) sum(series[t+n] * h[m+n+1]))
}

(There were some variables defined but not used, and many defined for the sake of changing the variable name itself. Though not a huge savings with these input data, I opted to leave them out. If the input data is considerably larger, the memory footprint might have more of an effect.)

When comparing performance of this with the original, it appears to operate in almost 20% of the time (I'm looking at the median for that stat):

require(microbenchmark)
microbenchmark(f1=filter(x=series, m=4, delt=(40/60)), f3=filter3(x=series, m=4, delt=(40/60)), times=1000)
## Unit: microseconds
## expr min lq median uq max neval
## f1 2121.381 2180.634 2227.229 2527.8990 17368.383 1000
## f3 406.152 432.569 453.482 495.4905 2235.118 1000

HOWEVER, for some reason they are a little different:

f1 <- filter(x=series, m=4, delt=(40/60))
f3 <- filter3(x=series, m=4, delt=(40/60))
identical(f1, f3)
## [1] FALSE
max(f1-f3)
## [1] 4.440892e-16

... but the difference is very small. (I tried it in both 32- and 64-bit mode with the same results.) Considering the lower magnitude your sample data creates is around 0.0268, this 4.44e-16 seems a bit inconsequential.

Stack Exchange Network

Optimizing the function in R by replacing for loop

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing the function in R by replacing for loop

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions