A dataframe and a right-sided formula are given:
dat <- data.frame(
A = c("a", "b", "c"),
B = c("x", "y", "z"),
NotUsed = c(1, 2, 3)
)
frml <- ~ A + B + A:B
From them, I want to get this list:
# [[1]]
# [1] a b c
# Levels: a b c
#
# [[2]]
# [1] x y z
# Levels: x y z
#
# [[3]]
# [1] a:x b:y c:z
# Levels: a:x b:y c:z
Here is how I get this list:
library(lazyeval) # to use 'as.lazy' and 'lazy_eval'
tf <- terms.formula(frml)
factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")
used <- lapply(eval(tvars, envir = dat), as.factor)
names(used) <- factors
lapply(tlabs, function(tlab){
droplevels(lazy_eval(as.lazy(tlab), data = used))
})
Do you have a better way to propose?
1 Answer 1
Code Quality
Blank Lines
Generally, your code quality needs some improvement. For example, insert blank lines. So instead of
factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")
I would suggest
factors <- rownames(attr(tf, "factors"))
tvars <- attr(tf, "variables")
tlabs <- attr(tf, "term.labels")
Order Within A Script or Source Code
Generally, one puts package import statements at the top of a file. Which you are not doing, and you also give no reason why you are doing it. So I would put library(lazyeval)
at the top.
Further, you set variables before you use them. For example, factors <- rownames(attr(tf, "factors"))
.
These observations lead to the following script.
# to use 'as.lazy' and 'lazy_eval'
library(lazyeval)
dat <- data.frame(
A = c("a", "b", "c"),
B = c("x", "y", "z"),
NotUsed = c(1, 2, 3))
frml <- ~ A + B + A:B
tf <- terms.formula(frml)
tvars <- attr(tf, "variables")
used <- lapply(eval(tvars, envir = dat), as.factor)
names(used) <- rownames(attr(tf, "factors"))
tlabs <- attr(tf, "term.labels")
lapply(tlabs, function(tlab){
droplevels(lazy_eval(as.lazy(tlab), data = used))
})
Please pay attention to how a grouped the code lines.
A Better Way
Regarding a better way, for which you have asked, I would propose the following.
dat <- data.frame(A = c("a", "b", "c"),
B = c("x", "y", "z"),
NotUsed = c(1, 2, 3))
dat <- subset(x = dat, select = -NotUsed)
dat$c <- as.factor(paste(dat$A, dat$B, sep = ":"))
my.list <- lapply(dat,
FUN = function(column)
{
return(column)
})
In my view, this is a better way because it expresses your intention better. If you would have given me your code without any comments I would have a hard time explaining what it does or what it's purpose is. In addition, it does not require an additional package.
HTH!
-
1\$\begingroup\$ Welcome to the Code Review Community. The purpose of the code review community is to help coders improve their coding skills by reading through their code and suggesting how the code can be improved. Unlike stack overflow, instead of posting solutions we post meaningful observations about the code. Code only alternate solutions are considered poor answers and may be deleted by the community. Please read How do I write a good answer?. \$\endgroup\$2020年11月13日 13:25:41 +00:00Commented Nov 13, 2020 at 13:25
-
\$\begingroup\$ Your "better way" is specific to this example. I need something general. \$\endgroup\$Stéphane Laurent– Stéphane Laurent2020年11月16日 11:31:49 +00:00Commented Nov 16, 2020 at 11:31
-
\$\begingroup\$ @StéphaneLaurent And what should be more general? Selecting the columns? \$\endgroup\$MacOS– MacOS2020年11月16日 13:27:54 +00:00Commented Nov 16, 2020 at 13:27
-
\$\begingroup\$ E.g.
~ A + B + C/A + A*B + A:B:C + ...
. Moreover I want to use a formula, which disappeared in your code. \$\endgroup\$Stéphane Laurent– Stéphane Laurent2020年11月16日 14:33:30 +00:00Commented Nov 16, 2020 at 14:33