Adding column to table in a for loop

Question 1

I'm running the following code on quite large data frames. I've rewritten it for the iris dataset to make it reproducable.
I'm quite unexperienced with the apply functions and I find them a pain in the bum to apply them.

Is there any ways to drastically improve the performance of this process?

lmfit <- lm(iris$Petal.Width ~ iris$Sepal.Length + iris$Sepal.Width)
out_index <- 1
TableWithResiduals <- data.frame(matrix(ncol = ncol(iris) +1, nrow = nrow(iris)))
for (row in 1:length(resid(lmfit))){
 TableWithResiduals[out_index,] <- cbind(iris[row,],resid(lmfit)[row])
 out_index <- out_index +1 
}
colnames(TableWithResiduals) <- colnames(iris)
colnames(TableWithResiduals)[length(TableWithResiduals)] <- "Residual_value"

Question 2

If you look at the doc for cbind, which you already use, you will see that it can take whole matrices, data.frames, and vectors as inputs. This means you can just do:

TableWithResiduals <- cbind(iris, Residual_value = resid(lmfit))

You could also have done:

TableWithResiduals <- iris
TableWithResiduals$Residual_value <- resid(lmfit)

If it were not for these solutions, there are a few things that could be improved in your code. First, you could have used row directly instead of the out_index variable you created. Second, the last two lines of your code could have been merged into one: names(TableWithResiduals) <- c(names(iris), "Residual_value"). Also, if you look at the doc for lm, you could have saved yourself some typing by doing lm(Petal.Width ~ Sepal.Length + Sepal.Width, iris)

Question 3

Thanks alot! Using the TableWithResiduals$Residual_value <- resid(lmfit) the improvement in speed is almost a thousand times according to microbenchmark, it's almost instantaneously now :)

flodel flodelflodel 3,5551 gold badge16 silver badges15 bronze badges · Accepted Answer · 2015-11-02 12:10:28Z

If you look at the doc for cbind, which you already use, you will see that it can take whole matrices, data.frames, and vectors as inputs. This means you can just do:

TableWithResiduals <- cbind(iris, Residual_value = resid(lmfit))

You could also have done:

TableWithResiduals <- iris
TableWithResiduals$Residual_value <- resid(lmfit)

If it were not for these solutions, there are a few things that could be improved in your code. First, you could have used row directly instead of the out_index variable you created. Second, the last two lines of your code could have been merged into one: names(TableWithResiduals) <- c(names(iris), "Residual_value"). Also, if you look at the doc for lm, you could have saved yourself some typing by doing lm(Petal.Width ~ Sepal.Length + Sepal.Width, iris)

Thanks alot! Using the TableWithResiduals$Residual_value <- resid(lmfit) the improvement in speed is almost a thousand times according to microbenchmark, it's almost instantaneously now :)

Stack Exchange Network

Adding column to table in a for loop

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Adding column to table in a for loop

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions