I am wondering if this is correct:
VAR <- c("City")
data %>%
select(salary, age, VAR)
melt(id.vars = c("Salary2", "Age2","City")) -> finalData
I get the message:
Note: Using an external vector in selections is ambiguous.
i Use `all_of(VAR)` instead of `VAR` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
The resulting finalData seems correct, but I wonder if this is the way to go. Thanks!
2 Answers 2
Answer
Like the error says, use all_of
(imported by tidyverse
from tidyselect
):
var <- "Sepal.Length"
iris %>%
select(tidyselect::all_of(var)) %>%
str
# 'data.frame': 150 obs. of 1 variable:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
Rationale
Let's say we want to select a column:
iris[, "Sepal.Length"]
In base R, we can easily select columns with variables:
var <- "Sepal.Length"
iris[, var]
However, you are using tidyverse
functions. For example, select
is from the dplyr
package. The tidyverse
packages rely on a special type of non-standard evaluation. As such, the correct way of selecting the "Sepal.Length"
column is:
select(iris, Sepal.Length)
# more typically written as:
iris %>% select(Sepal.Length)
As you can see, we are no longer using quotation marks. This is similar to iris$Sepal.Length
in base R. A more extensive explanation of design choices in dplyr
can be found here. The core reasons for designing tidyverse
this way are that it is more intuitive to use, and that is often faster to write.
Let's consider your case:
VAR <- c("City")
iris %>% select(VAR)
What it is doing, is looking for "VAR"
inside iris
. Since it cannot find it, it will then evaluate VAR
, which yield "City"
. It will then look for "City"
, which in this case it won't find. It does work if we specify a column that is present (like in your example):
VAR <- c("Sepal.Length")
iris %>% select(VAR)
Where it goes wrong
So, what happens if we have a column called VAR
, and supply a variable called VAR
?
iris$VAR <- 1
VAR <- c("Sepal.Length")
iris %>% select(VAR) %>% str
#'data.frame': 150 obs. of 1 variable:
# $ VAR: num 1 1 1 1 1 1 1 1 1 1 ...
Here, it will return the column with ones, NOT what we intended. Using all_of
, we can explicitly indicate that select
should consider an EXTERNAL variable. This is to ensure that you get the results that you expect:
iris$VAR <- 1
VAR <- c("Sepal.Length")
iris %>% select(all_of(VAR)) %>% str
#'data.frame': 150 obs. of 1 variable:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
-
\$\begingroup\$ Thanks, however, I got confused as to which package I am using. I use the select function, from dplyr, not tidyverse. So It should work in my case. Also, in my case VAR is not the name of any column, its value is. \$\endgroup\$Elina Bochkova– Elina Bochkova2021年06月06日 07:13:03 +00:00Commented Jun 6, 2021 at 7:13
-
1\$\begingroup\$
dplyr
is part oftidyverse
. If you type?select
, you'll find thatdplyr
actually importsselect
fromtidyselect
:) \$\endgroup\$slamballais– slamballais2021年06月06日 07:14:32 +00:00Commented Jun 6, 2021 at 7:14 -
\$\begingroup\$ And I realise that
VAR
is not one of your columns. The point I was making is that it is dangerous not to useall_of( )
because one of your columns may have the same name. Usingall_of( )
ensures that your code behaves as you expect it to. It is safer to include it, although in your specific example it is not needed. \$\endgroup\$slamballais– slamballais2021年06月06日 07:16:18 +00:00Commented Jun 6, 2021 at 7:16 -
\$\begingroup\$ Thanks, all clear now :) \$\endgroup\$Elina Bochkova– Elina Bochkova2021年06月06日 07:27:27 +00:00Commented Jun 6, 2021 at 7:27
Here the VAR created another instance that is not part of data. Remember in R, everything is a vector.
-
\$\begingroup\$ Thanks. How to create a variable that can replace tens of others one so I don't have to replace manually? \$\endgroup\$Elina Bochkova– Elina Bochkova2021年06月05日 15:33:50 +00:00Commented Jun 5, 2021 at 15:33
-
\$\begingroup\$ This answer is not correct; running
select(VAR)
doesn't create anything. It just happens to evaluateVAR
(since the column"VAR"
is not present) to then runselect(City)
, which is a column in the original dataset. If it had not been a column, it would give an error. \$\endgroup\$slamballais– slamballais2021年06月06日 06:22:48 +00:00Commented Jun 6, 2021 at 6:22 -
\$\begingroup\$ Thanks! now I too have got it cleared. \$\endgroup\$Balakumar Natarajan– Balakumar Natarajan2021年06月06日 14:57:46 +00:00Commented Jun 6, 2021 at 14:57
-
2\$\begingroup\$ Not everything in R is a vector. For example, functions aren't. \$\endgroup\$J. Mini– J. Mini2021年06月10日 12:08:40 +00:00Commented Jun 10, 2021 at 12:08