I have this function in R:
Minimum <- function(data) {
answer <- numeric(length(data))
diference <- c(0, diff(data, lag = 1, differences = 1)) #Padded initially =0
answer[1]=data[1]
for (i in 2:length(diference)) {
if (diference[i]==0) {
answer[i]=answer[i-1]
} else {
answer[i]=data[i]-diference[i]/2
}
}
return(answer)
}
Its purpose is to find the minimum value which "data" could had before it was rounded.
The minimum possible value is the average of the values which "data" had at the last change of value in "data"
This code works, but since for
loops are inefficient in R, it is advised to vectorize the function.
The problem is that the "answer" vector depends on the former values in "answer", so I cannot use a lambda function.
1 Answer 1
You can firstly change by difference != 0
and then use na.locf
to replace NA
s by last available value recursively.
minimum_new <- function(data) {
answer <- rep(NA, length(data))
difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
answer[1] <- data[1]
answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
answer <- zoo::na.locf(answer, na.rm = FALSE)
answer
}
This version is faster for me by at least 2 times.
> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
expr min lq mean median uq max neval cld
loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a
With check
it's also checks the equality of output.
-
\$\begingroup\$ I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM \$\endgroup\$yoxota– yoxota2018年04月17日 14:20:36 +00:00Commented Apr 17, 2018 at 14:20
diference[i]==0
will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors. \$\endgroup\$stopifnot(all(diff(data) >= 0))
. \$\endgroup\$diff(data)
that is not exactly zero and make that your (estimated) rounded precision for all values?Minimum <- function(data) { d <- diff(data); p <- min(d[d > 0]); data - p/2 }
. It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data. \$\endgroup\$