Construct list from a dataframe and a formula

Question 1

A dataframe and a right-sided formula are given:

dat <- data.frame(
 A = c("a", "b", "c"),
 B = c("x", "y", "z"),
 NotUsed = c(1, 2, 3)
)
frml <- ~ A + B + A:B

From them, I want to get this list:

# [[1]]
# [1] a b c
# Levels: a b c
# 
# [[2]]
# [1] x y z
# Levels: x y z
# 
# [[3]]
# [1] a:x b:y c:z
# Levels: a:x b:y c:z

Here is how I get this list:

library(lazyeval) # to use 'as.lazy' and 'lazy_eval'
tf <- terms.formula(frml)
factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")
used <- lapply(eval(tvars, envir = dat), as.factor)
names(used) <- factors
lapply(tlabs, function(tlab){
 droplevels(lazy_eval(as.lazy(tlab), data = used))
})

Do you have a better way to propose?

Question 2

Code Quality

Blank Lines

Generally, your code quality needs some improvement. For example, insert blank lines. So instead of

factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")

I would suggest

factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")

Order Within A Script or Source Code

Generally, one puts package import statements at the top of a file. Which you are not doing, and you also give no reason why you are doing it. So I would put library(lazyeval) at the top.

Further, you set variables before you use them. For example, factors <- rownames(attr(tf, "factors")).

These observations lead to the following script.

# to use 'as.lazy' and 'lazy_eval'
library(lazyeval)
dat <- data.frame(
 A = c("a", "b", "c"),
 B = c("x", "y", "z"),
 NotUsed = c(1, 2, 3))
frml <- ~ A + B + A:B
tf <- terms.formula(frml)
tvars <- attr(tf, "variables")
used <- lapply(eval(tvars, envir = dat), as.factor)
names(used) <- rownames(attr(tf, "factors"))
tlabs <- attr(tf, "term.labels")
lapply(tlabs, function(tlab){
 droplevels(lazy_eval(as.lazy(tlab), data = used))
})

Please pay attention to how a grouped the code lines.

A Better Way

Regarding a better way, for which you have asked, I would propose the following.

dat <- data.frame(A = c("a", "b", "c"),
 B = c("x", "y", "z"),
 NotUsed = c(1, 2, 3))
dat <- subset(x = dat, select = -NotUsed)
dat$c <- as.factor(paste(dat$A, dat$B, sep = ":"))
my.list <- lapply(dat,
 FUN = function(column)
 {
 return(column)
 })

In my view, this is a better way because it expresses your intention better. If you would have given me your code without any comments I would have a hard time explaining what it does or what it's purpose is. In addition, it does not require an additional package.

HTH!

Question 3

Welcome to the Code Review Community. The purpose of the code review community is to help coders improve their coding skills by reading through their code and suggesting how the code can be improved. Unlike stack overflow, instead of posting solutions we post meaningful observations about the code. Code only alternate solutions are considered poor answers and may be deleted by the community. Please read How do I write a good answer?.

Question 4

Your "better way" is specific to this example. I need something general.

Question 5

@StéphaneLaurent And what should be more general? Selecting the columns?

Question 6

E.g. ~ A + B + C/A + A*B + A:B:C + .... Moreover I want to use a formula, which disappeared in your code.

MacOS MacOS 1294 bronze badges · Answer 1 · 2020-11-13 11:11:57Z

Code Quality

Blank Lines

Generally, your code quality needs some improvement. For example, insert blank lines. So instead of

factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")

I would suggest

factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")

Order Within A Script or Source Code

Generally, one puts package import statements at the top of a file. Which you are not doing, and you also give no reason why you are doing it. So I would put library(lazyeval) at the top.

Further, you set variables before you use them. For example, factors <- rownames(attr(tf, "factors")).

These observations lead to the following script.

# to use 'as.lazy' and 'lazy_eval'
library(lazyeval)
dat <- data.frame(
 A = c("a", "b", "c"),
 B = c("x", "y", "z"),
 NotUsed = c(1, 2, 3))
frml <- ~ A + B + A:B
tf <- terms.formula(frml)
tvars <- attr(tf, "variables")
used <- lapply(eval(tvars, envir = dat), as.factor)
names(used) <- rownames(attr(tf, "factors"))
tlabs <- attr(tf, "term.labels")
lapply(tlabs, function(tlab){
 droplevels(lazy_eval(as.lazy(tlab), data = used))
})

Please pay attention to how a grouped the code lines.

A Better Way

Regarding a better way, for which you have asked, I would propose the following.

dat <- data.frame(A = c("a", "b", "c"),
 B = c("x", "y", "z"),
 NotUsed = c(1, 2, 3))
dat <- subset(x = dat, select = -NotUsed)
dat$c <- as.factor(paste(dat$A, dat$B, sep = ":"))
my.list <- lapply(dat,
 FUN = function(column)
 {
 return(column)
 })

In my view, this is a better way because it expresses your intention better. If you would have given me your code without any comments I would have a hard time explaining what it does or what it's purpose is. In addition, it does not require an additional package.

HTH!

Welcome to the Code Review Community. The purpose of the code review community is to help coders improve their coding skills by reading through their code and suggesting how the code can be improved. Unlike stack overflow, instead of posting solutions we post meaningful observations about the code. Code only alternate solutions are considered poor answers and may be deleted by the community. Please read How do I write a good answer?.
Your "better way" is specific to this example. I need something general.
@StéphaneLaurent And what should be more general? Selecting the columns?
E.g. ~ A + B + C/A + A*B + A:B:C + .... Moreover I want to use a formula, which disappeared in your code.

Stack Exchange Network

Construct list from a dataframe and a formula

1 Answer 1

Code Quality

Blank Lines

Order Within A Script or Source Code

A Better Way

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Construct list from a dataframe and a formula

1 Answer 1

Code Quality

Blank Lines

Order Within A Script or Source Code

A Better Way

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions