This brief vignette uses the vaccinations dataset
included in {ggalluvial}. As in the
technical introduction, the order of the levels is reversed to be
more intuitive. Objects from other {ggplot2} extensions are accessed via
:: and :::.
knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.align = "center")
library(ggalluvial)
data(vaccinations)
vaccinations <- transform(vaccinations,
response = factor(response, rev(levels(response))))The issue on the table: Strata are most helpful when they’re overlaid
with text labels. Yet the strata often vary in height, and the labels in
length, to such a degree that fitting the text inside the strata at a
uniform size renders them illegible. In principle, the user could treat
size as a variable aesthetic and manually fit text to
strata, but this is cumbersome, and doesn’t help anyway in cases where
large text is needed.
To illustrate the problem, check out the plot below. It’s by no means an egregious case, but it’ll do. (For a more practical example, see this question on StackOverflow, which prompted this vignette.)
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject, y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, 0)) +
geom_flow(width = 1/4) +
geom_stratum(alpha = .5, width = 1/4) +
geom_text(stat = "stratum", size = 4) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses", "labeled using `geom_text()`")One option is to simply omit those labels that don’t fit within their
strata. In response to an issue,
v0.9.2 includes parameters in stat_stratum()
to exclude strata outside a specified height range; while few would use
this to omit the rectangles themselves, it can be used in tandem with
geom_text() to shirk this problem, at least when the labels
are concise:
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject, y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, 0)) +
geom_flow(width = 1/4) +
geom_stratum(alpha = .5, width = 1/4) +
geom_text(stat = "stratum", size = 4, min.y = 100) +
theme(legend.position = "none") +
ggtitle(
"vaccination survey responses",
"labeled using `geom_text()` with `min.y = 100`"
)This is a useful fix for some cases. Still, if the goal is a publication-ready graphic, then it reaffirms the need for more adaptable and elegant solutions. Fortunately, two wonderful packages deliver with, shall we say, flowing colors.
Two {ggplot2} extensions are well-suited to this problem: {ggrepel} and {ggfittext}. They provide
new geom layers that use the output of existing stat layers to situate
text: ggrepel::geom_text_repel() takes the same aesthetics
as ggplot2::geom_text(), namely x,
y, and label. In contrast,
ggfittext::geom_fit_text() only specifically requires
label but also needs enough information to determine the
rectangle that will contain the text. This can be encoded as
xmin and xmax or as x and
width for the horizontal direction, and as
ymin and ymax or as y and
height for the vertical direction. Conveniently,
ggalluvial::stat_stratum() produces more than enough
information for both geoms, including x, xmin,
xmax, and their vertical counterparts.
All this can be gleaned from the ggproto objects that
construct the layers:
print(ggrepel::GeomTextRepel$required_aes)## [1] "x" "y" "label"
print(ggfittext:::GeomFitText$required_aes)## [1] "label"
print(ggfittext:::GeomFitText$setup_data)## <ggproto method>
## <Wrapper function>
## function (...)
## setup_data(...)
##
## <Inner function (f)>
## function (data, params)
## {
## if (!(("xmin" %in% names(data) & "xmax" %in% names(data)) |
## ("x" %in% names(data)))) {
## stop("geom_fit_text needs either 'xmin' and 'xmax', or 'x'",
## .call = FALSE)
## }
## if (!("ymin" %in% names(data) & "ymax" %in% names(data) |
## "y" %in% names(data))) {
## stop("geom_fit_text needs either 'ymin' and 'ymax', or 'y'",
## .call = FALSE)
## }
## if ((!is.null(params$width)) & (!"unit" %in% class(params$width))) {
## data$xmin <- data$x - params$width/2
## data$xmax <- data$x + params$width/2
## }
## if ((!is.null(params$height)) & (!"unit" %in% class(params$height))) {
## data$ymin <- data$y - params$height/2
## data$ymax <- data$y + params$height/2
## }
## if (is.null(params$width) & !"xmin" %in% names(data)) {
## data$width <- ggplot2::resolution(data$x, FALSE) * 0.9
## data$xmin <- data$x - data$width/2
## data$xmax <- data$x + data$width/2
## data$width <- NULL
## }
## if (is.null(params$height) & !"ymin" %in% names(data)) {
## data$height <- ggplot2::resolution(data$y, FALSE) * 0.9
## data$ymin <- data$y - data$height/2
## data$ymax <- data$y + data$height/2
## data$height <- NULL
## }
## if (!is.null(params$formatter)) {
## if (!is.function(params$formatter)) {
## stop("`formatter` must be a function")
## }
## formatted_labels <- sapply(data$label, params$formatter,
## USE.NAMES = FALSE)
## if ((!length(formatted_labels) == length(data$label)) |
## (!is.character(formatted_labels))) {
## stop("`formatter` must produce a character vector of same length as input")
## }
## data$label <- formatted_labels
## }
## data
## }
print(StatStratum$compute_panel)## <ggproto method>
## <Wrapper function>
## function (...)
## compute_panel(..., self = self)
##
## <Inner function (f)>
## function (self, data, scales, decreasing = NULL, reverse = NULL,
## absolute = NULL, discern = FALSE, distill = "first", negate.strata = NULL,
## infer.label = FALSE, label.strata = NULL, min.y = NULL, max.y = NULL,
## min.height = NULL, max.height = NULL)
## {
## if (is.null(decreasing))
## decreasing <- ggalluvial_opt("decreasing")
## if (is.null(reverse))
## reverse <- ggalluvial_opt("reverse")
## if (is.null(absolute))
## absolute <- ggalluvial_opt("absolute")
## if (!is.null(label.strata)) {
## defunct_parameter("label.strata", msg = "use `aes(label = after_stat(stratum))`.")
## infer.label <- label.strata
## }
## if (infer.label) {
## deprecate_parameter("infer.label", msg = "Use `aes(label = after_stat(stratum))`.")
## if (is.null(data$label)) {
## data$label <- data$stratum
## }
## else {
## warning("Aesthetic `label` is specified, ", "so parameter `infer.label` will be ignored.")
## }
## }
## diff_aes <- intersect(c(.color_diff_aesthetics, .text_aesthetics),
## names(data))
## data$yneg <- data$y < 0
## data$lode <- data$alluvium
## distill <- distill_fun(distill)
## weight <- data$weight
## data$weight <- NULL
## if (is.null(weight))
## weight <- 1
## data$n <- weight
## data$count <- data$y * weight
## by_vars <- c("x", "yneg", "stratum")
## only_vars <- c(diff_aes)
## sum_vars <- c("y", "n", "count")
## if (!is.null(data$lode)) {
## agg_lode <- stats::aggregate(data[, "lode", drop = FALSE],
## data[, by_vars], distill)
## }
## if (length(only_vars) > 0) {
## agg_only <- stats::aggregate(data[, only_vars, drop = FALSE],
## data[, by_vars], only)
## }
## data <- stats::aggregate(data[, sum_vars], data[, by_vars],
## sum)
## if (!is.null(data$lode)) {
## data <- merge(data, agg_lode)
## }
## if (length(only_vars) > 0) {
## data <- merge(data, agg_only)
## }
## data <- subset(data, y != 0)
## data <- deposit_data(data, decreasing, reverse, absolute)
## x_sums <- tapply(abs(data$count), data$x, sum, na.rm = TRUE)
## data$prop <- data$count/x_sums[match(as.character(data$x),
## names(x_sums))]
## data <- data[with(data, order(deposit)), , drop = FALSE]
## data$ycum <- NA
## for (xx in unique(data$x)) {
## for (yn in c(FALSE, TRUE)) {
## ww <- which(data$x == xx & data$yneg == yn)
## data$ycum[ww] <- cumulate(data$y[ww])
## }
## }
## data$ymin <- data$ycum - abs(data$y)/2
## data$ymax <- data$ycum + abs(data$y)/2
## data$y <- data$ycum
## data$yneg <- NULL
## data$ycum <- NULL
## if (!is.null(min.height)) {
## deprecate_parameter("min.height", "min.y")
## min.y <- min.height
## }
## if (!is.null(max.height)) {
## deprecate_parameter("max.height", "max.y")
## max.y <- max.height
## }
## if (!is.null(min.y))
## data <- subset(data, ymax - ymin >= min.y)
## if (!is.null(max.y))
## data <- subset(data, ymax - ymin <= max.y)
## data
## }
I reached the specific solutions through trial and error. They may not be the best tricks for most cases, but they demonstrate what these packages can do. For many more examples, see the respective package vignettes: for {ggrepel}, and for {ggfittext}.
{ggrepel} is most often (in my experience) used to repel text away
from symbols in a scatterplot, in whatever directions prevent them from
overlapping the symbols and each other. In this case, however, it makes
much more sense to align them vertically a fixed horizontal distance
(nudge_x) away from the strata and repel them vertically
from each other (direction = "y") just enough to print them
without overlap. It takes an extra bit of effort to render text
only for the strata at the first (or at the last) axis, but the
result is worth it.
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject, y = freq,
fill = response)) +
scale_x_discrete(expand = c(.4, 0)) +
geom_flow(width = 1/4) +
geom_stratum(alpha = .5, width = 1/4) +
scale_linetype_manual(values = c("blank", "solid")) +
ggrepel::geom_text_repel(
aes(label = ifelse(as.numeric(survey) == 1, as.character(response), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = -.5
) +
ggrepel::geom_text_repel(
aes(label = ifelse(as.numeric(survey) == 3, as.character(response), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = .5
) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses", "labeled using `geom_text_repel()`")## Warning: Removed 8 rows containing missing values (`geom_text_repel()`).
## Removed 8 rows containing missing values (`geom_text_repel()`).
{ggfittext} is simplicity itself: The strata are just rectangles, so
no more parameter specifications are necessary to fit the text into
them. One key parameter is min.size, which defaults to
4 and controls how small the text is allowed to get without
being omitted.
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject, y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, 0)) +
geom_flow(width = 1/4) +
geom_stratum(alpha = .5, width = 1/4) +
ggfittext::geom_fit_text(stat = "stratum", width = 1/4, min.size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses", "labeled using `geom_fit_text()`")Note that this solution requires {ggfittext} v0.6.0.
sessioninfo::session_info()## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.1 (2022年06月23日)
## os macOS Catalina 10.15.7
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate C
## ctype en_US.UTF-8
## tz America/New_York
## date 2023年02月13日
## pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## bslib 0.4.2 2022年12月16日 [3] CRAN (R 4.2.0)
## cachem 1.0.6 2021年08月19日 [3] CRAN (R 4.2.0)
## cli 3.6.0 2023年01月09日 [3] CRAN (R 4.2.0)
## colorspace 2.1-0 2023年01月23日 [3] CRAN (R 4.2.0)
## digest 0.6.31 2022年12月11日 [3] CRAN (R 4.2.0)
## dplyr 1.1.0 2023年01月29日 [3] CRAN (R 4.2.0)
## evaluate 0.20 2023年01月17日 [3] CRAN (R 4.2.0)
## fansi 1.0.4 2023年01月22日 [3] CRAN (R 4.2.0)
## farver 2.1.1 2022年07月06日 [3] CRAN (R 4.2.0)
## fastmap 1.1.0 2021年01月25日 [3] CRAN (R 4.2.0)
## generics 0.1.3 2022年07月05日 [3] CRAN (R 4.2.0)
## ggalluvial * 0.12.5 2023年02月13日 [1] local
## ggfittext 0.9.1 2021年01月30日 [3] CRAN (R 4.2.0)
## ggplot2 * 3.4.0 2022年11月04日 [3] CRAN (R 4.2.1)
## ggrepel 0.9.3 2023年02月03日 [3] CRAN (R 4.2.0)
## glue 1.6.2 2022年02月24日 [3] CRAN (R 4.2.0)
## gtable 0.3.1 2022年09月01日 [3] CRAN (R 4.2.0)
## highr 0.10 2022年12月22日 [3] CRAN (R 4.2.0)
## htmltools 0.5.4 2022年12月07日 [3] CRAN (R 4.2.0)
## jquerylib 0.1.4 2021年04月26日 [3] CRAN (R 4.2.0)
## jsonlite 1.8.4 2022年12月06日 [3] CRAN (R 4.2.0)
## knitr 1.42 2023年01月25日 [3] CRAN (R 4.2.0)
## labeling 0.4.2 2020年10月20日 [3] CRAN (R 4.2.0)
## lifecycle 1.0.3 2022年10月07日 [3] CRAN (R 4.2.0)
## magrittr 2.0.3 2022年03月30日 [3] CRAN (R 4.2.0)
## munsell 0.5.0 2018年06月12日 [3] CRAN (R 4.2.0)
## pillar 1.8.1 2022年08月19日 [3] CRAN (R 4.2.0)
## pkgconfig 2.0.3 2019年09月22日 [3] CRAN (R 4.2.0)
## purrr 1.0.1 2023年01月10日 [3] CRAN (R 4.2.0)
## R6 2.5.1 2021年08月19日 [3] CRAN (R 4.2.0)
## RColorBrewer 1.1-3 2022年04月03日 [3] CRAN (R 4.2.0)
## Rcpp 1.0.10 2023年01月22日 [3] CRAN (R 4.2.1)
## rlang 1.0.6 2022年09月24日 [3] CRAN (R 4.2.0)
## rmarkdown 2.20 2023年01月19日 [3] CRAN (R 4.2.0)
## rstudioapi 0.14 2022年08月22日 [3] CRAN (R 4.2.0)
## sass 0.4.5 2023年01月24日 [3] CRAN (R 4.2.0)
## scales 1.2.1 2022年08月20日 [3] CRAN (R 4.2.0)
## sessioninfo 1.2.2 2021年12月06日 [3] CRAN (R 4.2.0)
## stringi 1.7.12 2023年01月11日 [3] CRAN (R 4.2.0)
## tibble 3.1.8 2022年07月22日 [3] CRAN (R 4.2.0)
## tidyr 1.3.0 2023年01月24日 [3] CRAN (R 4.2.0)
## tidyselect 1.2.0 2022年10月10日 [3] CRAN (R 4.2.0)
## utf8 1.2.3 2023年01月31日 [3] CRAN (R 4.2.1)
## vctrs 0.5.2 2023年01月23日 [3] CRAN (R 4.2.0)
## withr 2.5.0 2022年03月03日 [3] CRAN (R 4.2.0)
## xfun 0.37 2023年01月31日 [3] CRAN (R 4.2.1)
## yaml 2.3.7 2023年01月23日 [3] CRAN (R 4.2.0)
##
## [1] /private/var/folders/k6/l4mq9ctj3219429xnvqpdbxm8tckkx/T/Rtmp9fMfyT/Rinst61ce73b51348
## [2] /private/var/folders/k6/l4mq9ctj3219429xnvqpdbxm8tckkx/T/RtmplDE0GG/temp_libpath60ab25e50e46
## [3] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
##
## ──────────────────────────────────────────────────────────────────────────────