1
\$\begingroup\$

I am trying to create a function that I want to use in my real data. Is there any way to optimize the following function? This is a function I created to filter the data.

filter <- function(x, m, delt) {
 series <- x
 series_filtered <- rep(0,length(series))
 delta_t <- delt # time step between data points (hours)
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delta_t/tau_crit # critical frequency
 M <- m # number of time steps before and after available in filtering window
 h <- rep(0,(2*M+1)) # initialize the weights
 # get the lanczos coefficients
 for (n in (-1*M):M) { 
 h[M+n+1] <- ifelse(n == 0,theta_crit*delta_t/pi, 
 (sin(n*theta_crit*delta_t)/(pi*n)) * (sin(n*pi/M)/(n*pi/M)))
 }
 # need to adjust the weights so that they add up to 1
 h_unbiased <- h/sum(h)
 h <- h_unbiased
 # step through each time for which we'd like to create a filtered value
 for (t in (1+M):(length(series)-M)) {
 for (n in (-1*M):M) {
 # apply the cofficients to create the filtered series
 series_filtered[t] = series_filtered[t] + series[t+n] * h[M+n+1]
 }
 }
 # Filtered data 
 y2 <- series_filtered[(M+1):(length(series)-M)]
 return(y2)
}

Any suggestions on removing for loop and replacing by apply or any other function is appreciated.

In the above function, I have one single for loop and one double for loop.

Edit:

A testing can be done by the following data:

x <- 1:100
series <- sin(2*x)+sin(x/4) + sin(x/8) + rnorm(100)
jd2 <- filter(x=series,m=4, delt=(40/60))
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 12, 2014 at 6:16
\$\endgroup\$
4
  • 1
    \$\begingroup\$ This isn't so much a question, as a code writing request. This is not trivial to interpret without any real information on the purpose or action of the function. \$\endgroup\$ Commented Feb 12, 2014 at 6:32
  • 3
    \$\begingroup\$ It's a long-standing urban legend that *apply are faster than for. In general there's no speed gain, just clarity of code (maybe :-) ). However, anywhere you can vectorize rather than loop you'll save time, and anywhere you can pre-allocate (as you did with series_filtered) that'll help too. \$\endgroup\$ Commented Feb 12, 2014 at 12:41
  • \$\begingroup\$ @CarlWitthoft I primarily like the the apply family for their brevity of code. And I find them easier to read, but easyness is in the eye of the beholder. \$\endgroup\$ Commented May 3, 2014 at 19:54
  • \$\begingroup\$ have you looked into rcpp? \$\endgroup\$ Commented Jun 16, 2014 at 21:06

2 Answers 2

3
\$\begingroup\$

I have tried this, but I don't know, how much (and even if) faster it is. Anyway, you used a lot of redundant code (creating variables M, h_unbiased, delta_t etc.).

filter2 <- function(x, m, delt) {
 series <- x
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical frequency
 # get the lanczos coefficients
 h <- sapply(-m:m, FUN=function(x)ifelse(x == 0,theta_crit*delta_t/pi, 
 (sin(x*theta_crit*delt)/(pi*x)) * (sin(x*pi/m)/(x*pi/m))))
 # need to adjust the weights so that they add up to 1
 h <- h/sum(h)
 # step through each time for which we'd like to create a filtered value
 y2 <- sapply((1+m):(length(series)-m), function(t){
 sum(unlist(sapply(-m:m, FUN=function(x)(series[t+x]*h[m+x+1]))))
 })
 # return
 return(y2)
}
answered Feb 12, 2014 at 7:17
\$\endgroup\$
1
  • \$\begingroup\$ +1 Thank you so much for your time and effort to try to update the function. Results are working fine and are consistent but it seems it is little slower than the code I wrote before. \$\endgroup\$ Commented Feb 12, 2014 at 17:20
3
\$\begingroup\$

This works a little faster, taking advantage of vectorized code in a couple places.

filter3 <- function(x, m, delt) {
 xlen <- length(x)
 tau_crit <- 3 # critical period (hours)
 theta_crit <- 2*pi*delt/tau_crit # critical h
 n <- seq(-m, m)
 h <- (sin(n*theta_crit*delt)/(pi*n)) * (sin(n*pi/m)/(n*pi/m))
 h[n == 0] <- theta_crit * delt/pi
 h <- h/sum(h)
 sapply((1+m):(xlen-m), function(t) sum(series[t+n] * h[m+n+1]))
}

(There were some variables defined but not used, and many defined for the sake of changing the variable name itself. Though not a huge savings with these input data, I opted to leave them out. If the input data is considerably larger, the memory footprint might have more of an effect.)

When comparing performance of this with the original, it appears to operate in almost 20% of the time (I'm looking at the median for that stat):

require(microbenchmark)
microbenchmark(f1=filter(x=series, m=4, delt=(40/60)), f3=filter3(x=series, m=4, delt=(40/60)), times=1000)
## Unit: microseconds
## expr min lq median uq max neval
## f1 2121.381 2180.634 2227.229 2527.8990 17368.383 1000
## f3 406.152 432.569 453.482 495.4905 2235.118 1000

HOWEVER, for some reason they are a little different:

f1 <- filter(x=series, m=4, delt=(40/60))
f3 <- filter3(x=series, m=4, delt=(40/60))
identical(f1, f3)
## [1] FALSE
max(f1-f3)
## [1] 4.440892e-16

... but the difference is very small. (I tried it in both 32- and 64-bit mode with the same results.) Considering the lower magnitude your sample data creates is around 0.0268, this 4.44e-16 seems a bit inconsequential.

answered May 12, 2014 at 1:28
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.