I am trying to create a function that I want to use in my real data. Is there any way to optimize the following function? This is a function I created to filter the data.
filter <- function(x, m, delt) {
series <- x
series_filtered <- rep(0,length(series))
delta_t <- delt # time step between data points (hours)
tau_crit <- 3 # critical period (hours)
theta_crit <- 2*pi*delta_t/tau_crit # critical frequency
M <- m # number of time steps before and after available in filtering window
h <- rep(0,(2*M+1)) # initialize the weights
# get the lanczos coefficients
for (n in (-1*M):M) {
h[M+n+1] <- ifelse(n == 0,theta_crit*delta_t/pi,
(sin(n*theta_crit*delta_t)/(pi*n)) * (sin(n*pi/M)/(n*pi/M)))
}
# need to adjust the weights so that they add up to 1
h_unbiased <- h/sum(h)
h <- h_unbiased
# step through each time for which we'd like to create a filtered value
for (t in (1+M):(length(series)-M)) {
for (n in (-1*M):M) {
# apply the cofficients to create the filtered series
series_filtered[t] = series_filtered[t] + series[t+n] * h[M+n+1]
}
}
# Filtered data
y2 <- series_filtered[(M+1):(length(series)-M)]
return(y2)
}
Any suggestions on removing for
loop and replacing by apply
or any other function is appreciated.
In the above function, I have one single for
loop and one double for
loop.
Edit:
A testing can be done by the following data:
x <- 1:100
series <- sin(2*x)+sin(x/4) + sin(x/8) + rnorm(100)
jd2 <- filter(x=series,m=4, delt=(40/60))
2 Answers 2
I have tried this, but I don't know, how much (and even if) faster it is. Anyway, you used a lot of redundant code (creating variables M, h_unbiased, delta_t etc.).
filter2 <- function(x, m, delt) {
series <- x
tau_crit <- 3 # critical period (hours)
theta_crit <- 2*pi*delt/tau_crit # critical frequency
# get the lanczos coefficients
h <- sapply(-m:m, FUN=function(x)ifelse(x == 0,theta_crit*delta_t/pi,
(sin(x*theta_crit*delt)/(pi*x)) * (sin(x*pi/m)/(x*pi/m))))
# need to adjust the weights so that they add up to 1
h <- h/sum(h)
# step through each time for which we'd like to create a filtered value
y2 <- sapply((1+m):(length(series)-m), function(t){
sum(unlist(sapply(-m:m, FUN=function(x)(series[t+x]*h[m+x+1]))))
})
# return
return(y2)
}
-
\$\begingroup\$ +1 Thank you so much for your time and effort to try to update the function. Results are working fine and are consistent but it seems it is little slower than the code I wrote before. \$\endgroup\$Jdbaba– Jdbaba2014年02月12日 17:20:28 +00:00Commented Feb 12, 2014 at 17:20
This works a little faster, taking advantage of vectorized code in a couple places.
filter3 <- function(x, m, delt) {
xlen <- length(x)
tau_crit <- 3 # critical period (hours)
theta_crit <- 2*pi*delt/tau_crit # critical h
n <- seq(-m, m)
h <- (sin(n*theta_crit*delt)/(pi*n)) * (sin(n*pi/m)/(n*pi/m))
h[n == 0] <- theta_crit * delt/pi
h <- h/sum(h)
sapply((1+m):(xlen-m), function(t) sum(series[t+n] * h[m+n+1]))
}
(There were some variables defined but not used, and many defined for the sake of changing the variable name itself. Though not a huge savings with these input data, I opted to leave them out. If the input data is considerably larger, the memory footprint might have more of an effect.)
When comparing performance of this with the original, it appears to operate in almost 20% of the time (I'm looking at the median for that stat):
require(microbenchmark)
microbenchmark(f1=filter(x=series, m=4, delt=(40/60)), f3=filter3(x=series, m=4, delt=(40/60)), times=1000)
## Unit: microseconds
## expr min lq median uq max neval
## f1 2121.381 2180.634 2227.229 2527.8990 17368.383 1000
## f3 406.152 432.569 453.482 495.4905 2235.118 1000
HOWEVER, for some reason they are a little different:
f1 <- filter(x=series, m=4, delt=(40/60))
f3 <- filter3(x=series, m=4, delt=(40/60))
identical(f1, f3)
## [1] FALSE
max(f1-f3)
## [1] 4.440892e-16
... but the difference is very small. (I tried it in both 32- and 64-bit mode with the same results.) Considering the lower magnitude your sample data creates is around 0.0268
, this 4.44e-16
seems a bit inconsequential.
*apply
are faster thanfor
. In general there's no speed gain, just clarity of code (maybe :-) ). However, anywhere you can vectorize rather than loop you'll save time, and anywhere you can pre-allocate (as you did withseries_filtered
) that'll help too. \$\endgroup\$rcpp
? \$\endgroup\$