For this section, we’ll be using a CSV dump of the 2016 GSS (General Social Survey) from its respective R library, a dataset that sociologists continually manage to squeeze more and more insights out of. More importantly, the Gapminder dataset from the previous section has a lot of continuous variables (such as GDP per capita and life expectancy, which we worked with), but no categorical variables. The GSS has a wide variety of categorical variables to work with, making it ideal for making bar charts and histograms.
data-frame: 2867 rows x 33 columns
┌─────────┬─────────────┬────┬────────┬────────┬──────┐
│grass│marital│kids│siblings│relig │ballot│
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│NA │Married│3 │2 │None│1 │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal│Never Married│0 │3 │None│2 │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Not Legal│Married│2 │3 │Catholic│3 │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│NA │Married│4+│3 │Catholic│1 │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal│Married│2 │2 │None│3 │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal│Married│2 │2 │None│2 │
└─────────┴─────────────┴────┴────────┴────────┴──────┘
2861 rows, 27 cols elided
(use (show df everything #:n-rows 'all) for full frame)
Clearly, we have a lot of data to work with here, but a lot of it is categorical – meaning we can make some bar charts!
#:title"Religious preferences, GSS 2016"
#:title"Religious preferences among regions, GSS 2016"#:width600#:height400
#:title"Religious preferences among regions, GSS 2016"#:width600#:height400
But both of these methods of presentation, while they have their uses, are still difficult to read. Both of them require consulting the legend in order to determine the bar type, and furthermore, the stacked bar makes it somewhat difficult to compare different categories within each region. To remedy this, we need to introduce another concept...