Optimization over 7 constants, running a function 10,000,000 times

Question 1

I am not quite sure how to attach my data, but it is 5737 observations of 3 variables. Variables V and P are strictly < 0, and variable R is -2 < R < 2.

The last function, correcttot, will help me find the optimal constants, or ranges of constants, for this function. However, the function that it runs over, percentcorrect, takes several seconds to run, which means that repeating it 10,000,000 times is not feasible.

Data

setwd("~/Desktop")
dat<-read.csv(file="data.csv",sep=",",header=T)
attach(dat)
###################################################################################
voltoalph<-function(v,c){ # Where V is a data vector and C is a constant
 alpha<-c(rep(0,length(v)))
 for(i in 1:length(v)){
 alpha[i+1]<-abs(c*(v[i]/max(v,na.rm=TRUE)))
 }
 alpha
}
voltoenen<-function(v,c){ # Where V is a data vector and C is a constant
 enen<-c(rep(1,length(v)))
 for(i in 1:length(v)){
 enen[i+1]<-abs((v[i]/max(v,na.rm=TRUE)))
 }
 enhelp<-1/(enen)
 enhelp2<-c*(enhelp/max(enhelp))
 enhelp2
}
ema<-function(v,n,a){ # Where V is a data vector and n is a contant and a is a constant
 avevec<-c(rep(0,n)) 
 for(i in 1:n){ 
 avevec[i]<-((1-a[i])^(n-i))*v[i]}
 divvec<-c(rep(0,n)) 
 for(i in 1:n){ 
 divvec[i]<-((1-a[i])^(n-i)) 
 }
 sum(avevec)/sum(divvec) 
}
betaema<-function(v,n,a,l){ # Where V is a data vector and n,a,l are constants
 secondvec<-c(rep(0, length(v)))
 for(i in l:length(v)){
 secondvec[i]<-ema(v[(i-n[i]+1):i],n[i],a)
 }
 secondvec
}
#################################################################################################################
howright<-function(v,r,c,l){ # Where v and r are data vectors and c,l are constants
 rightvec<-0
 for(i in l:(length(r)-c)){
 if((v[i]*mean(r[i+1]:r[i+c]))>0){
 rightvec<-rightvec+1
 }
 else{
 rightvec<-rightvec
 }
 }
 rightvec/(length(r)-l-c)
}
#################################################################################################################
percentcorrect<-function(ca1,ca2,cn1,cn2,e,d,c,v,p,r){ ###V, P, and R are data vectors, rest constant
 vol1<-voltoalph(v,ca1)
 vol2<-voltoalph(v,ca2)
 ens1<-voltoenen(v,cn1)
 ens2<-voltoenen(v,cn2)
 als<-c(vol1)
 als2<-c(vol2)
 n1<-c(ens1)
 n2<-c(ens2)
 anotherema1<-betaema(p,n1,als,max(cn1,cn2))
 anotherema2<-betaema(p,n2,als2,max(cn1,cn2))
 slope1<-c(rep(0,length(p)))
 slope2<-c(rep(0,length(p)))
 for(i in (max(cn1,cn2)+d):length(anotherema1)){
 slope1[i]<-(anotherema1[i]-anotherema1[i-d])/d
 }
 for(i in (max(cn1,cn2)+e):length(anotherema2)){
 slope2[i]<-(anotherema2[i]-anotherema2[i-e])/e
 }
 sig<-slope1-slope2
 hvec<-howright(sig,r,c,max(cn1,cn2))
 hvec
}
##########################################################################################
correcttot<-function(v,p,r){ ###Where v, p, and r are data vectors
 correct3<-array(0,dim=c(10,10,10,10,10,10,10))
 for(i in 1:10){
 for(j in 1:10){
 for(k in 1:10){
 for(l in 1:10){
 for(m in 2:10){
 for(n in 2:10){
 for(o in 1:10){
 correct3[i,j,k,l,m,n,o]<-percentcorrect((i/10),(j/10),(20*k),(20*l),m,n,o,v,p,r)
 }
 }
 }
 }
 }
 }
 }
 print(correct3)
}
newvec<-correcttot(vl,p,rt) # run it on the vectors vl, p and rt
which(newvec==max(newvec2),arr.ind=TRUE)

Question 2

Post some data too. Say 10-20 lines at random. Or if you can link to it over dropbox.

Question 3

Added the first 20-something lines of the data. Let me know if it is better to link the whole file. Thank you again Mike!

Question 4

filehosting.org/file/details/523562/newdata.csv Here is a link to the full dataset

Question 5

You have several loops that are completely unnecessary. Remove them and you will see dramatic speedups. Take voltoalph and voltoenen for example:

voltoalph2 <- function(v,c) {
 # v: a data vector
 # c: a scalar constant
 c(0, c[1L] * (v / max(v, na.rm=TRUE)))
}
voltoenen2 <- function(v,c) {
 # v: a data vector
 # c: a scalar constant
 enhelp <- 1 / c(1, abs(v / max(v, na.rm=TRUE)))
 c[1L] * (enhelp / max(enhelp))
}
require(microbenchmark)
v <- 1:10000
microbenchmark(voltoalph(v, 2), voltoalph2(v, 2))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# voltoalph(v, 2) 105004.652 106038.7245 108767.0230 106725.827 108326.7975 151482.035 100 b
# voltoalph2(v, 2) 81.925 83.5825 141.3382 89.794 91.6065 5350.052 100 a 
identical(voltoalph(v, 2), voltoalph2(v, 2))
# [1] TRUE
microbenchmark(voltoenen(v, 2), voltoenen2(v, 2))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# voltoenen(v, 2) 103595.526 105433.803 108797.6308 106882.274 110160.65 148550.397 100 b
# voltoenen2(v, 2) 200.275 203.182 227.6582 207.932 220.57 1007.375 100 a 
identical(voltoenen(v, 2), voltoenen2(v, 2))
# [1] TRUE

The vectorized (non-loop) versions are over 1000, and 500 times faster (for voltoalph and voltoenen, respectively), and provide identical output. percentcorrect calls each of those functions two times, which makes the speed improvement have twice the impact.

Question 6

Welcome to Code Review! Good job on your first answer.

Question 7

Thats really helpful. Can that same paradigm be used for a function where the references are non constant? Say for betaema where n and a change?

Question 8

Or more importantly for ema, because I imagine the 2 loops in that set the time back.

Question 9

@user3678028: ema can be vectorized. I don't think betaema can without a significant tradeoff in the amount of memory used. That said, it would be very simple to code in C/C++.

Joshua Ulrich Joshua Ulrich 2461 silver badge6 bronze badges · Answer 1 · 2015-11-22 18:39:45Z

You have several loops that are completely unnecessary. Remove them and you will see dramatic speedups. Take voltoalph and voltoenen for example:

voltoalph2 <- function(v,c) {
 # v: a data vector
 # c: a scalar constant
 c(0, c[1L] * (v / max(v, na.rm=TRUE)))
}
voltoenen2 <- function(v,c) {
 # v: a data vector
 # c: a scalar constant
 enhelp <- 1 / c(1, abs(v / max(v, na.rm=TRUE)))
 c[1L] * (enhelp / max(enhelp))
}
require(microbenchmark)
v <- 1:10000
microbenchmark(voltoalph(v, 2), voltoalph2(v, 2))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# voltoalph(v, 2) 105004.652 106038.7245 108767.0230 106725.827 108326.7975 151482.035 100 b
# voltoalph2(v, 2) 81.925 83.5825 141.3382 89.794 91.6065 5350.052 100 a 
identical(voltoalph(v, 2), voltoalph2(v, 2))
# [1] TRUE
microbenchmark(voltoenen(v, 2), voltoenen2(v, 2))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# voltoenen(v, 2) 103595.526 105433.803 108797.6308 106882.274 110160.65 148550.397 100 b
# voltoenen2(v, 2) 200.275 203.182 227.6582 207.932 220.57 1007.375 100 a 
identical(voltoenen(v, 2), voltoenen2(v, 2))
# [1] TRUE

The vectorized (non-loop) versions are over 1000, and 500 times faster (for voltoalph and voltoenen, respectively), and provide identical output. percentcorrect calls each of those functions two times, which makes the speed improvement have twice the impact.

Thats really helpful. Can that same paradigm be used for a function where the references are non constant? Say for betaema where n and a change?
Or more importantly for ema, because I imagine the 2 loops in that set the time back.
@user3678028: ema can be vectorized. I don't think betaema can without a significant tradeoff in the amount of memory used. That said, it would be very simple to code in C/C++.

Stack Exchange Network

Optimization over 7 constants, running a function 10,000,000 times

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimization over 7 constants, running a function 10,000,000 times

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions