Convert dataset from wide to a longer format for two different set of columns

Question 1

I have a dataset that is a bit too wide for the EDA plots I would like to make.

The data can be found here: https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData

But looks something like this:

CommonName	r_ChromaticityFemaleLowerMandible	r_ChromaticityFemaleUpperMandible	r_ChromaticityMaleLowerMandible	r_ChromaticityMaleUpperMandible	many more column ...
African_Broadbill	0.3331	0.3109	0.3584	0.3573	etc...

I am pivoting it longer so that three columns are tacked on to the end while removing the fourr_* Chromaticity columns

LowerMandible	UpperMandible	Sex
0.3331	0.3109	Female
0.3584	0.3573	Male
0.3473	0.340	Male

My current method of doing this requires two pivot_longer for upper and lower and then they are bonded together.

library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))
lowerMandible <- Bill_Colour %>%
 pivot_longer(c(r_ChromaticityFemaleLowerMandible, r_ChromaticityMaleLowerMandible),
 values_to = "LowerMandible") %>%
 select(-name)
upperMandible <- Bill_Colour %>%
 select(r_ChromaticityFemaleUpperMandible,
 r_ChromaticityMaleUpperMandible) %>%
 pivot_longer(c(r_ChromaticityFemaleUpperMandible, r_ChromaticityMaleUpperMandible),
 values_to = "UpperMandible") %>%
 mutate(Sex = as.factor(str_sub(name,15,-14))) %>%
 select(-name)
upperAndLowerBySex <- cbind.data.frame(lowerMandible, upperMandible) %>%
 select(-r_ChromaticityFemaleUpperMandible, -r_ChromaticityMaleUpperMandible)

I feel like there should be a way to do this with one set of pivot_longer rather than making multiple datasets.

Question 2

We can do this by renaming the columns to appropriate strings which can be processed by pivot_longer and its names_sep argument.

We need to rename the desired columns; e.g. from r_ChromaticityMaleLowerMandible to LowerMandible_Male. See this https://regex101.com/r/6QstQq/1 to better understand the regex pattern.

Then we can simply use pivot_longer and apply it on the columns that contain Mandible in their names. By providing multiple values to names_to and a parser to names_sep we can break the pivoted columns.

library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))

Bill_Colour %>% 
 select(contains("r_Chromaticity")) %>% ## you can select other columns as needed
 rename_with(~str_replace(.x, "(r_Chromaticity)(.*ale)(.*)", "\3円_\2円"), 
 contains("r_Chromaticity")) %>% 
 pivot_longer(., 
 cols = contains("Mandible"), 
 names_to = c(".value", "sex"),
 names_sep = "_")

#> # A tibble: 3,210 x 3
#> sex LowerMandible UpperMandible
#> <chr> <dbl> <dbl>
#> 1 Female 0.333 0.311
#> 2 Male 0.358 0.357
#> 3 Female 0.347 0.341
#> 4 Male 0.359 0.394
#> 5 Female 0.352 0.430
#> 6 Male 0.370 0.443
#> 7 Female 0.370 0.375
#> 8 Male 0.378 0.351
#> 9 Female 0.336 0.329
#> 10 Male 0.343 0.335
#> # ... with 3,200 more rows

^{Created on 2022年12月17日 by the reprex package (v2.0.1)}

Question 3

The trick to doing this in a single pivot_longer call is to get your column names into a desirable form, where you have some separator between the new column name (LowerMandible or UpperMandible) and the value you're using to expand the rows (Sex). Then you can use the names_sep argument to indicate your separator and the names_to argument to describe where they're being mapped:

upperAndLowerBySex <- Bill_Colour %>%
 rename("LowerMandible_Female" = "r_ChromaticityFemaleLowerMandible",
 "LowerMandible_Male" = "r_ChromaticityMaleLowerMandible",
 "UpperMandible_Female" = "r_ChromaticityFemaleUpperMandible",
 "UpperMandible_Male" = "r_ChromaticityMaleUpperMandible") %>%
 pivot_longer(c(LowerMandible_Female, LowerMandible_Male,
 UpperMandible_Female, UpperMandible_Male),
 names_to = c(".value", "Sex"),
 names_sep = "_")
head(upperAndLowerBySex[,c(1:2, 10:12)])
# Row SpeciesName Sex LowerMandible UpperMandible
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 1 Smithornis_capensis Female 0.333 0.311
# 2 1 Smithornis_capensis Male 0.358 0.357
# 3 2 Smithornis_sharpei Female 0.347 0.341
# 4 2 Smithornis_sharpei Male 0.359 0.394
# 5 3 Smithornis_rufolateralis Female 0.352 0.430
# 6 3 Smithornis_rufolateralis Male 0.370 0.443

You can read more about this approach here.

Question 4

Thanks again for your help. The fundamental helpful bit of using the names_sep came from you. But have gone with the other for the check mark as that neat work with the regex is also helpful.

M-- M-- 2554 silver badges12 bronze badges · Accepted Answer · 2022-12-17 19:22:00Z

We can do this by renaming the columns to appropriate strings which can be processed by pivot_longer and its names_sep argument.

We need to rename the desired columns; e.g. from r_ChromaticityMaleLowerMandible to LowerMandible_Male. See this https://regex101.com/r/6QstQq/1 to better understand the regex pattern.

Then we can simply use pivot_longer and apply it on the columns that contain Mandible in their names. By providing multiple values to names_to and a parser to names_sep we can break the pivoted columns.

library(tidyverse)
load(url("https://www.massey.ac.nz/~kgovinda/data/Bill_Colour.RData"))

Bill_Colour %>% 
 select(contains("r_Chromaticity")) %>% ## you can select other columns as needed
 rename_with(~str_replace(.x, "(r_Chromaticity)(.*ale)(.*)", "\3円_\2円"), 
 contains("r_Chromaticity")) %>% 
 pivot_longer(., 
 cols = contains("Mandible"), 
 names_to = c(".value", "sex"),
 names_sep = "_")

#> # A tibble: 3,210 x 3
#> sex LowerMandible UpperMandible
#> <chr> <dbl> <dbl>
#> 1 Female 0.333 0.311
#> 2 Male 0.358 0.357
#> 3 Female 0.347 0.341
#> 4 Male 0.359 0.394
#> 5 Female 0.352 0.430
#> 6 Male 0.370 0.443
#> 7 Female 0.370 0.375
#> 8 Male 0.378 0.351
#> 9 Female 0.336 0.329
#> 10 Male 0.343 0.335
#> # ... with 3,200 more rows

^{Created on 2022年12月17日 by the reprex package (v2.0.1)}

Stack Exchange Network

Convert dataset from wide to a longer format for two different set of columns

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert dataset from wide to a longer format for two different set of columns

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions