I'm running the following code on quite large data frames. I've rewritten it for the iris dataset to make it reproducable.
I'm quite unexperienced with the apply
functions and I find them a pain in the bum to apply them.
Is there any ways to drastically improve the performance of this process?
lmfit <- lm(iris$Petal.Width ~ iris$Sepal.Length + iris$Sepal.Width)
out_index <- 1
TableWithResiduals <- data.frame(matrix(ncol = ncol(iris) +1, nrow = nrow(iris)))
for (row in 1:length(resid(lmfit))){
TableWithResiduals[out_index,] <- cbind(iris[row,],resid(lmfit)[row])
out_index <- out_index +1
}
colnames(TableWithResiduals) <- colnames(iris)
colnames(TableWithResiduals)[length(TableWithResiduals)] <- "Residual_value"
1 Answer 1
If you look at the doc for cbind
, which you already use, you will see that it can take whole matrices, data.frames, and vectors as inputs. This means you can just do:
TableWithResiduals <- cbind(iris, Residual_value = resid(lmfit))
You could also have done:
TableWithResiduals <- iris
TableWithResiduals$Residual_value <- resid(lmfit)
If it were not for these solutions, there are a few things that could be improved in your code. First, you could have used row
directly instead of the out_index
variable you created. Second, the last two lines of your code could have been merged into one: names(TableWithResiduals) <- c(names(iris), "Residual_value")
. Also, if you look at the doc for lm
, you could have saved yourself some typing by doing lm(Petal.Width ~ Sepal.Length + Sepal.Width, iris)
-
\$\begingroup\$ Thanks alot! Using the
TableWithResiduals$Residual_value <- resid(lmfit)
the improvement in speed is almost a thousand times according to microbenchmark, it's almost instantaneously now :) \$\endgroup\$Bas– Bas2015年11月02日 12:43:23 +00:00Commented Nov 2, 2015 at 12:43