Using `{densityarea}`

To get started with using {densityarea}, we’ll need to load some packages, and some data to work with. {densityarea} is meant to play nicely with tidyverse-style data processing, in addition to loading the package itself, we’ll also load {dplyr}. We have the option of working with the density polygons in the form of simple features from {sf}, so we’ll load that as well. Finally, we’ll load {ggplot2} and {ggdensity} for the sake of data visualization.

 # package depends
 library(densityarea)
 library(dplyr)
 library(sf)
 library(ggdensity)

 library(ggplot2)

The dataset s01 is a data frame of vowel formant measurements.

 data(s01, package = "densityarea")
 head(s01)
 #> # A tibble: 6 ×ばつ 10
 #> name age sex word vowel plt_vclass ipa_vclass F1 F2 dur
 #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
 #> 1 s01 y f OKAY EY eyF ejF 764. 2088. 0.2 
 #> 2 s01 y f UM AH uh ʌ 700. 1881. 0.19
 #> 3 s01 y f I'M AY ay aj 889. 1934. 0.07
 #> 4 s01 y f LIVED IH i ɪ 556. 1530. 0.05
 #> 5 s01 y f IN IH i ɪ 612. 2323. 0.06
 #> 6 s01 y f COLUMBUS AH @ ə 612. 1904. 0.07

Initial look at the data

Let’s plot the original, raw data from s01, with the Highest Density Regions overlaid (thanks to the {ggdensity} package).

 ggplot(data = s01,
 aes(x = F2,
 y = F1)
 )+
 geom_point(alpha = 0.1)+
 stat_hdr(probs = c(0.8, 0.5),
 aes(fill = after_stat(probs)),
 color = "black",
 alpha = 0.8)+
 scale_y_reverse()+
 scale_x_reverse()+
 scale_fill_brewer(type = "seq")+
 coord_fixed()

The function ggdensity::get_hdr() is perfect for quickly adding interpretable densities to your plots. To work with these densities as polygons, we can use densityarea::density_polygons().

Getting density areas

Per the name of the package, we can get the area within each of these density polygons with density_area().

As a first data processing step, let’s log transform and flip our F1 and F2 values.

s01 |> 
 mutate(
 lF1 = -log(F1),
 lF2 = -log(F2)
 ) -> 
 s01

To get the area within the 80% density polygon for the entire data set, we’ll pass s01 through a dplyr::reframe() function.

s01 |> 
 group_by(name) |> 
 reframe(
 density_area(lF2, lF1, probs = 0.8)
 ) 
 #> # A tibble: 1 ×ばつ 4
 #> name level_id prob area
 #> <chr> <int> <dbl> <dbl>
 #> 1 s01 1 0.8 0.406

Or, if we wanted the areas associated with subsets of the data (say, for each vowel) we’d just change our dplyr::group_by() call.

s01 |> 
 group_by(name, vowel) |> 
 reframe(
 density_area(lF2, lF1, probs = 0.8)
 ) ->
 vowel_areas

Let’s rearrange the order of rows to see the largest areas first.

vowel_areas |> 
 arrange(desc(area))
 #> # A tibble: 15 ×ばつ 5
 #> name vowel level_id prob area
 #> <chr> <chr> <int> <dbl> <dbl>
 #> 1 s01 AO 1 0.8 0.488 
 #> 2 s01 AA 1 0.8 0.278 
 #> 3 s01 AH 1 0.8 0.274 
 #> 4 s01 AE 1 0.8 0.258 
 #> 5 s01 UW 1 0.8 0.229 
 #> 6 s01 EH 1 0.8 0.226 
 #> 7 s01 IY 1 0.8 0.206 
 #> 8 s01 UH 1 0.8 0.203 
 #> 9 s01 OW 1 0.8 0.197 
 #> 10 s01 IH 1 0.8 0.186 
 #> 11 s01 AY 1 0.8 0.171 
 #> 12 s01 AW 1 0.8 0.170 
 #> 13 s01 ER 1 0.8 0.145 
 #> 14 s01 EY 1 0.8 0.101 
 #> 15 s01 OY 1 0.8 0.0904

Density Polygons

Polygon Data Frames

A single probability level

In the simplest approach, we can use density_polygons() to return a data frame for just one probability level, 60%.

s01 |> 
 group_by(name) |> 
 reframe(
 density_polygons(lF2, lF1, probs = 0.6)
 )->
 sixty_poly_df
 
 head(sixty_poly_df)
 #> # A tibble: 6 ×ばつ 7
 #> name level_id id prob lF2 lF1 order
 #> <chr> <int> <int> <dbl> <dbl> <dbl> <int>
 #> 1 s01 1 1 0.6 -7.44 -6.78 1
 #> 2 s01 1 1 0.6 -7.42 -6.76 2
 #> 3 s01 1 1 0.6 -7.41 -6.75 3
 #> 4 s01 1 1 0.6 -7.40 -6.72 4
 #> 5 s01 1 1 0.6 -7.40 -6.69 5
 #> 6 s01 1 1 0.6 -7.40 -6.69 6

Now, it’s possible for the HDR polygon to actually come in multiple pieces, but in this case, there’s just one polygon, so we can plot it.

 ggplot(sixty_poly_df,
 aes(lF2, lF1))+
 geom_polygon(
 aes(color = prob,
 group = prob),
 fill = NA,
 linewidth = 1
 )+
 coord_fixed()

Multiple probability levels

To get polygons associated with multiple probability levels, we simply pass a vector of values to probs.

s01 |> 
 group_by(name) |> 
 reframe(
 density_polygons(lF2, 
 lF1, 
 probs = c(0.6, 0.8))
 )->
 multi_poly_df
 
 head(multi_poly_df)
 #> # A tibble: 6 ×ばつ 7
 #> name level_id id prob lF2 lF1 order
 #> <chr> <int> <int> <dbl> <dbl> <dbl> <int>
 #> 1 s01 2 1 0.8 -7.78 -6.01 1
 #> 2 s01 2 1 0.8 -7.80 -6.01 2
 #> 3 s01 2 1 0.8 -7.82 -6.02 3
 #> 4 s01 2 1 0.8 -7.85 -6.02 4
 #> 5 s01 2 1 0.8 -7.87 -6.02 5
 #> 6 s01 2 1 0.8 -7.90 -6.02 6

 ggplot(multi_poly_df,
 aes(lF2, lF1))+
 geom_polygon(
 aes(color = prob,
 group = prob),
 fill = NA,
 linewidth = 1
 )+
 coord_fixed()

Polygon Simple Features

We can also get density_polygons() to return the polygons as simple features, as defined in the {sf} package, by passing it the argument as_sf = TRUE.

s01 |> 
 group_by(name) |> 
 reframe(
 density_polygons(lF2,
 lF1,
 probs = c(0.8, 0.6),
 as_sf = TRUE)
 ) |> 
 st_sf()->
 multi_poly_sf

The final function there, sf::st_sf(), wasn’t strictly necessary, but makes life a little easier for plotting. Here’s what the result looks like:

multi_poly_sf
 #> Simple feature collection with 2 features and 3 fields
 #> Geometry type: POLYGON
 #> Dimension: XY
 #> Bounding box: xmin: -7.93703 ymin: -6.913825 xmax: -7.208434 ymax: -6.009484
 #> CRS: NA
 #> # A tibble: 2 ×ばつ 4
 #> name level_id prob geometry
 #> <chr> <int> <dbl> <POLYGON>
 #> 1 s01 1 0.6 ((-7.636315 -6.19764, -7.645598 -6.201069, -7.65986 -6.2...
 #> 2 s01 2 0.8 ((-7.777586 -6.009484, -7.801131 -6.010429, -7.824676 -6...

And here’s a plot.

 ggplot(multi_poly_sf)+
 geom_sf(aes(color = prob),
 fill = NA)

Using {densityarea}