Nested loops - Random Forest, multiple parameters

Question 1

I'm writing a code which task is to grow Random Forest trees based on multiple parameters. In short:

Firstly, I declare a data frame in which model parameters and some stats will be saved.
Secondly, I declare model parameters and the loop iterator (it will be showed after every loop iteration).
Next, I have a nested loops with the model and prediction function.
Furthermore, parameters and some stats from the confusion matrix are saved to the dataframe.
Additionally, the number of iterations is printed and counted.
Last but not least, garbage collector is called.

The code looks like this:

## data frame in which model parameters and some stats will be saved
model_eff <- data.frame("ntrees" = numeric(0),
 "zeros" = numeric(0), 
 "mvars"= numeric(0),
 "eff" = numeric(0),
 "0_0" = numeric(0),
 "0_1" = numeric(0),
 "1_0" = numeric(0),
 "1_1" = numeric(0),
 "predict_sum" = numeric(0),
 "triangle" = numeric(0))
## parameteres
ntrees <- c(300, 500)
zeros <- sum(train.target) * c(1, 2, 3, 4, 5)
mvars <- c(30, 50, 70, 90, 110, 130)
## loop counter
i = 1
## loop with model, prediction etc.
for (j in 1:length(ntrees)){
 for (k in 1:length(zeros)){
 for (l in 1:length(mvars)){
 ## i-th model
 model <- randomForest(train,
 y = as.factor(train.target),
 ntree = ntrees[j],
 do.trace = T,
 sampsize = c('0' = zeros[k], '1' = sum(train.target)),
 mtry = mvars[l])
 ## prediction - my function, apart from a regular prediction
 ## outputs additional info
 predict.model(model, val, val.target)
 ## inserting model parameters and stats to a data frame for further comparisons
 model_eff <- rbind(model_eff,
 c("ntrees" = ntrees[j],
 "zeros" = zeros[k],
 "mvars"= mvars[l],
 "eff" = eff_measures$eff,
 "0_0" = eff_measures$c.m[1, 1],
 "0_1" = eff_measures$c.m[1, 2],
 "1_0" = eff_measures$c.m[2, 1],
 "1_1" = eff_measures$c.m[2, 2],
 "predict_sum" = sum(TARGET3),
 "triangle" = eff_measures$triangle))
 ## printing the number of iteration
 cat("iteration =", i)
 i <- i+1
 ## calling garbage collector to assure free space in RAM
 gc()
 }
 }
}

I have already split the train/validation data sets and their target variables, knowing that Random Forest deals with such data mor efficiently. I also tried to use the "foreach" package for parallelizing computations, however, the growing time for only one tree was 10-15% longer than without using all the cores.

I would like to know if I can shorten the time of execution of this code, especially if there is a way to avoid multiple loops since I heard that they are not the best way of programming in R.

Question 2

Reproducible Example

Unfortunately, the code snippet that you gave does not lend itself to being reproducible. So, the advice being given is constrained.

Caches are nice

There are certain times where you should be caching a summation if the value is known to be constant through different iterations. In this particular case, we have: sum(train.target) and sum(TARGET3) that should be cached. Say:

stt = sum(train.target)
st3 = sum(TARGET3)

Knowledge (of size) is Power!

Immediately, one of the key issue you will face is the fact that you are rbind 60 items since you avoid giving stable numerical entries in the data.frame

## parameteres
ntrees <- c(300, 500)
zeros <- sum(train.target) * c(1, 2, 3, 4, 5)
mvars <- c(30, 50, 70, 90, 110, 130)
nitr = length(ntrees)*length(zeros)*length(mvars)
model_eff <- data.frame("ntrees" = numeric(nitr),
 "zeros" = numeric(nitr), 
 "mvars" = numeric(nitr),
 "eff" = numeric(nitr),
 "0_0" = numeric(nitr),
 "0_1" = numeric(nitr),
 "1_0" = numeric(nitr),
 "1_1" = numeric(nitr),
 "predict_sum" = numeric(nitr),
 "triangle" = numeric(nitr),
 stringsAsFactors = F)

Declare count = 1 before the 3x for loops. Then save results using:

model_eff[count,] = c("ntrees" = ntrees[j],
 "zeros" = zeros[k],
 "mvars"= mvars[l],
 "eff" = eff_measures$eff,
 "0_0" = eff_measures$c.m[1, 1],
 "0_1" = eff_measures$c.m[1, 2],
 "1_0" = eff_measures$c.m[2, 1],
 "1_1" = eff_measures$c.m[2, 2],
 "predict_sum" = st3 ,
 "triangle" = eff_measures$triangle))
count = count + 1

Parallel RandomForest via `caret`

The only other suggestion I have it to parallelize the build of the random forest via:

# caret modeling framework
library(caret)
# Parallel backend
library(doParallel)
# Register a cluster
registerDoParallel(cores = 5)
rf_model = train(train.target~.,data=train,method="rf",
 prox=TRUE,allowParallel=TRUE)

coatless coatless 2653 silver badges11 bronze badges · Answer 1 · 2016-03-19 17:59:35Z

Reproducible Example

Unfortunately, the code snippet that you gave does not lend itself to being reproducible. So, the advice being given is constrained.

Caches are nice

There are certain times where you should be caching a summation if the value is known to be constant through different iterations. In this particular case, we have: sum(train.target) and sum(TARGET3) that should be cached. Say:

stt = sum(train.target)
st3 = sum(TARGET3)

Knowledge (of size) is Power!

Immediately, one of the key issue you will face is the fact that you are rbind 60 items since you avoid giving stable numerical entries in the data.frame

## parameteres
ntrees <- c(300, 500)
zeros <- sum(train.target) * c(1, 2, 3, 4, 5)
mvars <- c(30, 50, 70, 90, 110, 130)
nitr = length(ntrees)*length(zeros)*length(mvars)
model_eff <- data.frame("ntrees" = numeric(nitr),
 "zeros" = numeric(nitr), 
 "mvars" = numeric(nitr),
 "eff" = numeric(nitr),
 "0_0" = numeric(nitr),
 "0_1" = numeric(nitr),
 "1_0" = numeric(nitr),
 "1_1" = numeric(nitr),
 "predict_sum" = numeric(nitr),
 "triangle" = numeric(nitr),
 stringsAsFactors = F)

Declare count = 1 before the 3x for loops. Then save results using:

model_eff[count,] = c("ntrees" = ntrees[j],
 "zeros" = zeros[k],
 "mvars"= mvars[l],
 "eff" = eff_measures$eff,
 "0_0" = eff_measures$c.m[1, 1],
 "0_1" = eff_measures$c.m[1, 2],
 "1_0" = eff_measures$c.m[2, 1],
 "1_1" = eff_measures$c.m[2, 2],
 "predict_sum" = st3 ,
 "triangle" = eff_measures$triangle))
count = count + 1

Parallel RandomForest via `caret`

The only other suggestion I have it to parallelize the build of the random forest via:

# caret modeling framework
library(caret)
# Parallel backend
library(doParallel)
# Register a cluster
registerDoParallel(cores = 5)
rf_model = train(train.target~.,data=train,method="rf",
 prox=TRUE,allowParallel=TRUE)

Stack Exchange Network

Nested loops - Random Forest, multiple parameters

1 Answer 1

Reproducible Example

Caches are nice

Knowledge (of size) is Power!

Parallel RandomForest via `caret`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Nested loops - Random Forest, multiple parameters

1 Answer 1

Reproducible Example

Caches are nice

Knowledge (of size) is Power!

Parallel RandomForest via caret

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Parallel RandomForest via `caret`