4
\$\begingroup\$

I have a dataset that is a bit too wide for the EDA plots I would like to make.

The data can be found here: https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData

But looks something like this:

CommonName r_ChromaticityFemaleLowerMandible r_ChromaticityFemaleUpperMandible r_ChromaticityMaleLowerMandible r_ChromaticityMaleUpperMandible many more column ...
African_Broadbill 0.3331 0.3109 0.3584 0.3573 etc...

I am pivoting it longer so that three columns are tacked on to the end while removing the fourr_* Chromaticity columns

LowerMandible UpperMandible Sex
0.3331 0.3109 Female
0.3584 0.3573 Male
0.3473 0.340 Male

My current method of doing this requires two pivot_longer for upper and lower and then they are bonded together.

library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))
lowerMandible <- Bill_Colour %>%
 pivot_longer(c(r_ChromaticityFemaleLowerMandible, r_ChromaticityMaleLowerMandible),
 values_to = "LowerMandible") %>%
 select(-name)
upperMandible <- Bill_Colour %>%
 select(r_ChromaticityFemaleUpperMandible,
 r_ChromaticityMaleUpperMandible) %>%
 pivot_longer(c(r_ChromaticityFemaleUpperMandible, r_ChromaticityMaleUpperMandible),
 values_to = "UpperMandible") %>%
 mutate(Sex = as.factor(str_sub(name,15,-14))) %>%
 select(-name)
upperAndLowerBySex <- cbind.data.frame(lowerMandible, upperMandible) %>%
 select(-r_ChromaticityFemaleUpperMandible, -r_ChromaticityMaleUpperMandible)

I feel like there should be a way to do this with one set of pivot_longer rather than making multiple datasets.

M--
2554 silver badges12 bronze badges
asked Dec 15, 2022 at 23:03
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

We can do this by renaming the columns to appropriate strings which can be processed by pivot_longer and its names_sep argument.

We need to rename the desired columns; e.g. from r_ChromaticityMaleLowerMandible to LowerMandible_Male. See this https://regex101.com/r/6QstQq/1 to better understand the regex pattern.

Then we can simply use pivot_longer and apply it on the columns that contain Mandible in their names. By providing multiple values to names_to and a parser to names_sep we can break the pivoted columns.

library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))
Bill_Colour %>% 
 select(contains("r_Chromaticity")) %>% ## you can select other columns as needed
 rename_with(~str_replace(.x, "(r_Chromaticity)(.*ale)(.*)", "\3円_\2円"), 
 contains("r_Chromaticity")) %>% 
 pivot_longer(., 
 cols = contains("Mandible"), 
 names_to = c(".value", "sex"),
 names_sep = "_")
#> # A tibble: 3,210 x 3
#> sex LowerMandible UpperMandible
#> <chr> <dbl> <dbl>
#> 1 Female 0.333 0.311
#> 2 Male 0.358 0.357
#> 3 Female 0.347 0.341
#> 4 Male 0.359 0.394
#> 5 Female 0.352 0.430
#> 6 Male 0.370 0.443
#> 7 Female 0.370 0.375
#> 8 Male 0.378 0.351
#> 9 Female 0.336 0.329
#> 10 Male 0.343 0.335
#> # ... with 3,200 more rows

Created on 2022年12月17日 by the reprex package (v2.0.1)

answered Dec 17, 2022 at 19:22
\$\endgroup\$
2
\$\begingroup\$

The trick to doing this in a single pivot_longer call is to get your column names into a desirable form, where you have some separator between the new column name (LowerMandible or UpperMandible) and the value you're using to expand the rows (Sex). Then you can use the names_sep argument to indicate your separator and the names_to argument to describe where they're being mapped:

upperAndLowerBySex <- Bill_Colour %>%
 rename("LowerMandible_Female" = "r_ChromaticityFemaleLowerMandible",
 "LowerMandible_Male" = "r_ChromaticityMaleLowerMandible",
 "UpperMandible_Female" = "r_ChromaticityFemaleUpperMandible",
 "UpperMandible_Male" = "r_ChromaticityMaleUpperMandible") %>%
 pivot_longer(c(LowerMandible_Female, LowerMandible_Male,
 UpperMandible_Female, UpperMandible_Male),
 names_to = c(".value", "Sex"),
 names_sep = "_")
head(upperAndLowerBySex[,c(1:2, 10:12)])
# Row SpeciesName Sex LowerMandible UpperMandible
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 1 Smithornis_capensis Female 0.333 0.311
# 2 1 Smithornis_capensis Male 0.358 0.357
# 3 2 Smithornis_sharpei Female 0.347 0.341
# 4 2 Smithornis_sharpei Male 0.359 0.394
# 5 3 Smithornis_rufolateralis Female 0.352 0.430
# 6 3 Smithornis_rufolateralis Male 0.370 0.443

You can read more about this approach here.

answered Dec 16, 2022 at 5:03
\$\endgroup\$
1
  • \$\begingroup\$ Thanks again for your help. The fundamental helpful bit of using the names_sep came from you. But have gone with the other for the check mark as that neat work with the regex is also helpful. \$\endgroup\$ Commented Dec 19, 2022 at 2:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.