3
\$\begingroup\$

I have a long data.table, payoff_mat. I would like to select the rows containing the maximum value of column X by group Y. There are several posts on this, and I think I am well optimized in this regard.

The twist is that I would like to repeat this process several times, for which I think I could do better. This is the hot-spot of my script.

This is how it works:

  1. Subset the rows containing the maximum value of column 3 by group (permutation of column 4-5).
  2. Subset again by the maximum of column 2 by group (permutation of column 4).
  3. Subset again by the maximum of column 1.
library(data.table)
library(iterpc)
set.seed(1)
I<-iterpc(table(rnorm(5)),3, ordered=TRUE,replace=FALSE)
support_n <-getnext(I,d=180) # not real size. Larger in practice.
payoff_n <-matrix(rnorm(180), ncol = 3) # not real size. Larger in practice.
payoff_mat<-cbind(payoff_n,support_n)
payoff_mat<-data.table(payoff_mat)
#My current solution: eval(parse(text = ....)) is long but seems to be the standard best syntax for data.table in these cases.
cn<-colnames(payoff_mat) #need column names, data.table does not like using column number.
step_1<-payoff_mat[payoff_mat[ ,.I[ which.max(eval(parse(text = cn[3]))) ] , by = eval(cn[4:5]) ][,V1]] #
step_2<-step_1[step_1[ ,.I[ which.max(eval(parse(text = cn[2]))) ] , by = eval(cn[4:4]) ][,V1]]
step_3<-step_2[ ,.SD[ which.max(eval(parse(text = cn[1]))) ] ]

My intuition suggests I lose a lot of time writing to memory by saving each step, but I am happy with any improvement.

I have tried calling it as a single command, but have not been successful figuring out the correct technique for sequential [] calls. Below is my (current) best attempt, but it is clear they are not equivalent:

payoff_mat [payoff_mat[ ,.I[ which.max(eval(parse(text = cn[3]))) ] , by = eval(cn[4:5]) ][,V1]] [payoff_mat[ ,.I[ which.max(eval(parse(text = cn[2]))) ] , by = eval(cn[4:4]) ][,V1]] [ ,.SD[ which.max(eval(parse(text = cn[1]))) ] ] 
asked Sep 17, 2017 at 15:58
\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

I am not a data.table expert so I can't tell if there is a faster approach, though https://stackoverflow.com/q/16573995/1201032 suggests you are already using something efficient at each step.

The changes I have to suggest would however make your code a lot easier to read and maintain. First, define a function that can be applied at each step:

max_subset <- function(dt, var, by)
 dt[dt[, .I[which.max(get(var))], by = by][,V1]]

Then, you can make your successive calls as follows:

step <- payoff_mat
step <- max_subset(step, "V3", c("V4", "V5"))
step <- max_subset(step, "V2", c("V4"))
step <- max_subset(step, "V1", c())

Or using the magrittr package:

library(magrittr)
payoff_mat %>% max_subset("V3", c("V4", "V5")) %>%
 max_subset("V2", c("V4")) %>%
 max_subset("V1", c()) -> step3

Note that assigning intermediate outputs to a variable in the first approach (step <- ...) should not carry any significant time (feel free to test). And your code will be a lot more readable than using nested calls: max_subset(max_subset(max_subset(...), ..., ...).

answered Sep 22, 2017 at 19:22
\$\endgroup\$
2
  • \$\begingroup\$ Thank you for the answer. I think I would like to get it in one call. This is just a minimum working example, the intermediate output is actually very large, (something like 8+ GB of RAM for the first step), so I -think- it is writing this to memory and therefore is probably costly. \$\endgroup\$ Commented Sep 24, 2017 at 19:35
  • 2
    \$\begingroup\$ Would you please test both approaches and circle back? I don't think your assumption that y <- f(g(h(x))) is less costly than y <- x; y <- h(y); y <- g(y); y <- f(y) is correct. When you do y <- f(g(h(x))), h(x) has to be computed in memory first. Assigning that memory space to a symbol (y) is not going to be significant memory-wise or time-wise. \$\endgroup\$ Commented Sep 24, 2017 at 21:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.