I have a dataset that is a bit too wide for the EDA plots I would like to make.
The data can be found here: https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData
But looks something like this:
CommonName | r_ChromaticityFemaleLowerMandible | r_ChromaticityFemaleUpperMandible | r_ChromaticityMaleLowerMandible | r_ChromaticityMaleUpperMandible | many more column ... |
---|---|---|---|---|---|
African_Broadbill | 0.3331 | 0.3109 | 0.3584 | 0.3573 | etc... |
I am pivoting it longer so that three columns are tacked on to the end while removing the fourr_*
Chromaticity columns
LowerMandible | UpperMandible | Sex |
---|---|---|
0.3331 | 0.3109 | Female |
0.3584 | 0.3573 | Male |
0.3473 | 0.340 | Male |
My current method of doing this requires two pivot_longer for upper and lower and then they are bonded together.
library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))
lowerMandible <- Bill_Colour %>%
pivot_longer(c(r_ChromaticityFemaleLowerMandible, r_ChromaticityMaleLowerMandible),
values_to = "LowerMandible") %>%
select(-name)
upperMandible <- Bill_Colour %>%
select(r_ChromaticityFemaleUpperMandible,
r_ChromaticityMaleUpperMandible) %>%
pivot_longer(c(r_ChromaticityFemaleUpperMandible, r_ChromaticityMaleUpperMandible),
values_to = "UpperMandible") %>%
mutate(Sex = as.factor(str_sub(name,15,-14))) %>%
select(-name)
upperAndLowerBySex <- cbind.data.frame(lowerMandible, upperMandible) %>%
select(-r_ChromaticityFemaleUpperMandible, -r_ChromaticityMaleUpperMandible)
I feel like there should be a way to do this with one set of pivot_longer
rather than making multiple datasets.
2 Answers 2
We can do this by renaming the columns to appropriate strings which can be processed by pivot_longer
and its names_sep
argument.
We need to rename the desired columns; e.g. from r_ChromaticityMaleLowerMandible
to LowerMandible_Male
. See this https://regex101.com/r/6QstQq/1 to better understand the regex pattern.
Then we can simply use pivot_longer
and apply it on the columns that contain Mandible
in their names. By providing multiple values to names_to
and a parser to names_sep
we can break the pivoted columns.
library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))
Bill_Colour %>%
select(contains("r_Chromaticity")) %>% ## you can select other columns as needed
rename_with(~str_replace(.x, "(r_Chromaticity)(.*ale)(.*)", "\3円_\2円"),
contains("r_Chromaticity")) %>%
pivot_longer(.,
cols = contains("Mandible"),
names_to = c(".value", "sex"),
names_sep = "_")
#> # A tibble: 3,210 x 3
#> sex LowerMandible UpperMandible
#> <chr> <dbl> <dbl>
#> 1 Female 0.333 0.311
#> 2 Male 0.358 0.357
#> 3 Female 0.347 0.341
#> 4 Male 0.359 0.394
#> 5 Female 0.352 0.430
#> 6 Male 0.370 0.443
#> 7 Female 0.370 0.375
#> 8 Male 0.378 0.351
#> 9 Female 0.336 0.329
#> 10 Male 0.343 0.335
#> # ... with 3,200 more rows
Created on 2022年12月17日 by the reprex package (v2.0.1)
The trick to doing this in a single pivot_longer
call is to get your column names into a desirable form, where you have some separator between the new column name (LowerMandible
or UpperMandible
) and the value you're using to expand the rows (Sex
). Then you can use the names_sep
argument to indicate your separator and the names_to
argument to describe where they're being mapped:
upperAndLowerBySex <- Bill_Colour %>%
rename("LowerMandible_Female" = "r_ChromaticityFemaleLowerMandible",
"LowerMandible_Male" = "r_ChromaticityMaleLowerMandible",
"UpperMandible_Female" = "r_ChromaticityFemaleUpperMandible",
"UpperMandible_Male" = "r_ChromaticityMaleUpperMandible") %>%
pivot_longer(c(LowerMandible_Female, LowerMandible_Male,
UpperMandible_Female, UpperMandible_Male),
names_to = c(".value", "Sex"),
names_sep = "_")
head(upperAndLowerBySex[,c(1:2, 10:12)])
# Row SpeciesName Sex LowerMandible UpperMandible
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 1 Smithornis_capensis Female 0.333 0.311
# 2 1 Smithornis_capensis Male 0.358 0.357
# 3 2 Smithornis_sharpei Female 0.347 0.341
# 4 2 Smithornis_sharpei Male 0.359 0.394
# 5 3 Smithornis_rufolateralis Female 0.352 0.430
# 6 3 Smithornis_rufolateralis Male 0.370 0.443
You can read more about this approach here.
-
\$\begingroup\$ Thanks again for your help. The fundamental helpful bit of using the names_sep came from you. But have gone with the other for the check mark as that neat work with the regex is also helpful. \$\endgroup\$James– James2022年12月19日 02:11:49 +00:00Commented Dec 19, 2022 at 2:11