.overview_tab
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
.overview_heat(
dat = NULL,
id = NULL,
time = NULL,
label = FALSE,
perc = FALSE,
col_low = NULL,
col_high = NULL,
xaxis = NULL,
yaxis = NULL,
theme_plot = NULL,
exp_total = NULL,
col_names = NULL
)
Arguments
dat
The data set
id
The scope (e.g., country codes or individual IDs). The axis is ordered in ascending order by default.
time
The time (e.g., time periods given by years, months, ...)
label
If TRUE (default), the total number of observations/percentages of observations are displayed. If FALSE, it returns no labels.
perc
If FALSE (default) plot returns the total number of observations per time-scope-unit. If TRUE, it returns the number of observations per time-scope-unit in percentage
col_low
Hex color code for the lowest value (default is "#dceaf2")
col_high
Hex color code for the lowest value (default is "#2A5773")
xaxis
Label of your x axis ("Time frame" is default)
yaxis
Label of your y axis ("Sample" is default)
theme_plot
Previously generated theme
exp_total
Expected total number of observations (i.e. maximum) for time unit.
col_names
The column names (containing id and time)
Value
A ggplot
.overview_tab
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
.overview_tab(dat = NULL, id = NULL, time = NULL, col_names = NULL)
Arguments
dat
Your data set
id
Scope (e.g., country codes or individual IDs)
time
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
col_names
The column names (containing id and time)
Value
A data.table
calculate_share_non_row_wise
Description
Function used in 'overview_na' to calculate the column-wise share of NA
Usage
calculate_share_non_row_wise(dat = NULL)
Arguments
dat
Data frame
Value
The function returns a data set that has the information on the column-wise NA share
calculate_share_row_wise
Description
Function used in 'overview_na' to calculate the share of NA row-wise
Usage
calculate_share_row_wise(dat = NULL)
Arguments
dat
Data frame
Value
The function returns a data set that has the information on the row-wise NA share
find_int_runs
Description
Function used in 'overview_tab' to find running integers
Usage
find_int_runs(run = NULL)
Arguments
run
Variable (integer) that should be checked for consecutive values
Value
The function returns a data set
overview_add_na_output
Description
Function used in 'overview_na' to generate a new data frame with na_count and percentage share of NAs for each row
Usage
overview_add_na_output(dat_result = NULL, dat = NULL)
Arguments
dat_result
Data.frame from 'overview_na'
dat
Data frame
Value
The function returns a data set that has the information on the row-wise NA share
overview_crossplot
Description
This function plots a ggplot to visualize a cross table plot.
Usage
overview_crossplot(
dat,
id,
time,
cond1,
cond2,
threshold1,
threshold2,
xaxis = "Condition 1",
yaxis = "Condition 2",
label = FALSE,
color = FALSE,
dot_size = 2,
fontsize = 2.5
)
Arguments
dat
Your data set
id
Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot.
time
Your time (e.g., time periods given by years, months, ...)
cond1
Variable that describes the first condition
cond2
Variable that describes the second condition
threshold1
A threshold for cond1
threshold2
A threshold for cond2
xaxis
Label of the x axis ("Condition 1" is default)
yaxis
Label of the y axis ("Condition 2" is default)
label
Label of the observations. Overlapping labels are avoided by using 'ggrepel'
color
Color of the different observation groups
dot_size
Option argument that defines the dot size (default is 2)
fontsize
If label is TRUE, the fontsize arguments allows to define the text of the labels (the default is 2.5)
Value
A ggplot figure that presents the sample information visually in a cross table
Examples
data(toydata)
overview_crossplot(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_crosstab
Description
Sorts a data set conditionally in a cross table. This can be helpful to get a sense of the time and scope conditions of a data set. Note, if used with a data set that has multiple observations on the id-time unit, the function automatically aggregates this information using the mean.
Usage
overview_crosstab(dat, cond1, cond2, threshold1, threshold2, id, time)
Arguments
dat
A data set object
cond1
Variable that describes the first condition
cond2
Variable that describes the second condition
threshold1
A threshold for cond1
threshold2
A threshold for cond2
id
Scope (e.g., country codes or individual IDs)
time
Time (e.g., time periods given by years, months, ...)
Value
A data frame object that contains a summary of the data set that can
later be converted to a 'LaTeX' output using overview_latex
Examples
data(toydata)
overview_crosstab(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_heat
Description
This function plots a heat map to visualize the coverage of the time-scope-units of the data. Options include total number of cases per time-scope-unit or relative number in percentage.
Usage
overview_heat(
dat,
id,
time,
perc = FALSE,
exp_total = NULL,
xaxis = "Time frame",
yaxis = "Sample",
col_low = "#dceaf2",
col_high = "#2A5773",
label = TRUE
)
Arguments
dat
The data set
id
The scope (e.g., country codes or individual IDs). The axis is ordered in ascending order by default.
time
The time (e.g., time periods given by years, months, ...)
perc
If FALSE (default) plot returns the total number of observations per time-scope-unit. If TRUE, it returns the number of observations per time-scope-unit in percentage
exp_total
Expected total number of observations (i.e. maximum) for time unit.
xaxis
Label of your x axis ("Time frame" is default)
yaxis
Label of your y axis ("Sample" is default)
col_low
Hex color code for the lowest value (default is "#dceaf2")
col_high
Hex color code for the lowest value (default is "#2A5773")
label
If TRUE (default), the total number of observations/percentages of observations are displayed. If FALSE, it returns no labels.
Value
A ggplot figure that presents sample coverage visually
Examples
data(toydata)
overview_heat(toydata, ccode, year, perc = TRUE, exp_total = 12)
overview_latex
Description
Produces a 'LaTeX' output for output obtained via
overview_tab and overview_crosstab
Usage
overview_latex(
obj,
title = "Time and scope of the sample",
id = "Sample",
time = "Time frame",
crosstab = FALSE,
cond1 = "Condition 1",
cond2 = "Condition 2",
save_out = FALSE,
file_path,
label = "tab:tab1",
fontsize,
file,
path
)
Arguments
obj
Overview object produced by overview_tab or overview_crosstab
title
Caption of the table (default is "Time and scope of the sample")
id
The name of the left column (default is "Sample"), will be ignored if crosstab is TRUE
time
The name of the right column (default is ("Time frame")), will
be ignored if crosstab is TRUE
crosstab
Logical argument, if TRUE produces a crosstab output,
default is FALSE
cond1
Description for the first condition (character), will be
ignored if crosstab is FALSE. This should correspond to the input
for cond1 in overview_crosstab
cond2
Description for the second condition (character), will be
ignored if crosstab is FALSE. This should correspond to the input
for cond2 in overview_crosstab
save_out
Optional argument, exports the output table as a .tex file, default is FALSE
file_path
Specifies the path and file name (.tex) where you store your output
label
Specifies the label (default is "tab:tab1")
fontsize
Specifies the font size (all 'LaTeX' font sizes such as "scriptsize" or "small" work)
file
This argument is deprecated. Please use "file_path" instead and add the full path.
path
This argument is deprecated. Please use "file_path" instead and add the full path.
Value
A 'LaTeX' output that can either be copy-pasted in a text document or exported directed as a .tex file
Examples
data(toydata)
overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
obj = overview_object,
title = "Some nice title",
crosstab = FALSE
)
#' overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
obj = overview_object,
title = "Some nice title",
file_path = "some/path_to/your_output_file.tex"
)
overview_ct_object <- overview_crosstab(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_latex(
obj = overview_ct_object,
title = "Some nice title for a cross tab",
crosstab = TRUE,
cond1 = "Name of first condition",
cond2 = "Name of second condition"
)
overview_na
Description
This function plots a ggplot to visualize the distribution of NAs across all variables in the data set.
Usage
overview_na(
dat,
yaxis = "Variables",
perc = TRUE,
row_wise = FALSE,
add = FALSE
)
Arguments
dat
Your data set
yaxis
Label of your y axis ("Variables" is default)
perc
If TRUE (default) plot returns the number of NAs in percentage
row_wise
If TRUE (FALSE is default) plot return the number of NAs per row
add
If TRUE (FALSE is default) it generates a new data frame with na_count and percentage share of NAs for each row
Value
Depending on the selection, the function returns a ggplot figure that presents the distribution of NAs in the data set or adds the information on the row-wise NA share
Examples
data(toydata)
overview_na(toydata, perc = FALSE)
overview_overlap
Description
Provides an overview of the overlap of two data sets. Cautionary note: This function is currently only preliminary workable and can only capture 2 data sets. We are working on an extension that allows to compare multiple data sets.
Usage
overview_overlap(
dat1,
dat2,
dat1_id,
dat2_id,
dat1_name = "Data set 1",
dat2_name = "Data set 2",
plot_type = "bar"
)
Arguments
dat1
A first data set object
dat2
A second data set object
dat1_id
Scope (e.g., country codes or individual IDs) of dat1. It is important that both ID variables are exactly the same to generate the perfect match.
dat2_id
Scope (e.g., country codes or individual IDs) of dat2. It is important that both ID variables are exactly the same to generate the perfect match.
dat1_name
Name of dat1 ("Data set 1" is the default)
dat2_name
Name of dat2 ("Data set 2" is the default)
plot_type
Type of plot ("bar" and "venn" are the two options) "venn" relies on the ggvenn function
Value
A ggplot2 object (bar chart) that shows the overlap of two data sets.
Examples
## Not run:
data(toydata)
toydata2 <- toydata[which(toydata$year > 1992), ]
overview_overlap(
dat1 = toydata, dat2 = toydata2, dat1_id = ccode,
dat2_id = ccode
)
## End(Not run)
overview_plot
Description
This function plots a ggplot to visualize the distribution of scope objects across the time frame.
Usage
overview_plot(
dat,
id,
time,
xaxis = "Time frame",
yaxis = "Sample",
asc = TRUE,
color,
dot_size = 2
)
Arguments
dat
Your data set
id
Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot.
time
Your time (e.g., time periods given by years, months, ...)
xaxis
Label of the x axis ("Time frame" is default)
yaxis
Label of the y axis ("Sample" is default)
asc
Sorting the y axis in ascending order ("TRUE" is default)
color
Optional argument that defines the color
dot_size
Option argument that defines the dot size (default is 2)
Value
A ggplot figure that presents the sample information visually
Examples
data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)
overview_plot_absolute
Description
Function used in 'overview_na' to plot the absolute share of NA values
Usage
overview_plot_absolute(
dat_result = NULL,
theme_plot = NULL,
yaxis = NULL,
xaxis = NULL
)
Arguments
dat_result
Data frame
theme_plot
Theme for the plot (pre-defined)
yaxis
Name for yaxis
xaxis
Name for xaxix
Value
The function returns a ggplot
overview_plot_percentage
Description
Function used in 'overview_na' to plot the percentage share of NA values
Usage
overview_plot_percentage(
dat_result = NULL,
theme_plot = NULL,
yaxis = NULL,
xaxis = NULL
)
Arguments
dat_result
Data frame
theme_plot
Theme for the plot (pre-defined)
yaxis
Name for yaxis
xaxis
Name for xaxix
Value
The function returns a ggplot
overview_tab
Description
Provides an overview table for the time and scope conditions of a data set. If a data.table object is provided, the function uses data.table's syntax to perform the evaluation
Usage
overview_tab(
dat,
id,
time = list(year = NULL, month = NULL, day = NULL),
complex_date = FALSE
)
Arguments
dat
A data frame or data table object
id
Scope (e.g., country codes or individual IDs)
time
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
complex_date
Boolean argument identifying if there is a more complex (list-wise) date_time parameter (FALSE is the default)
Value
A data frame object that contains a summary of a sample that
can later be converted to a 'LaTeX' output using overview_latex
Examples
# With version 1 (and also 2):
data(toydata)
output_table <- overview_tab(dat = toydata, id = ccode, time = year)
# With version 3:
overview_tab(dat = toydata, id = ccode, time = list(
year = toydata$year,
month = toydata$month, day = toydata$day
), complex_date = TRUE)
overview_tab_df
Description
Internal function that calculates the 'overview_tab' for data.frame objects
Usage
overview_tab_df(dat2 = NULL, dat = NULL, id = NULL, time = NULL)
Arguments
dat2
Your data set
dat
Your data set
id
Scope (e.g., country codes or individual IDs)
time
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
Value
A data.frame
overview_tab_dt
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
overview_tab_dt(dat = NULL, id = NULL, time = NULL, col_names = NULL)
Arguments
dat
Your data set
id
Scope (e.g., country codes or individual IDs)
time
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
col_names
The column names (containing id and time)
Value
A data.table
theme_heat_plot
Description
Defines the theme for the 'overview_heat' plot function
Usage
theme_heat_plot()
Value
A theme for the 'overview_heat' plot
theme_na_plot
Description
Defines the theme for the 'overview_na' plot function
Usage
theme_na_plot()
Value
A theme for the 'overview_na' plot
Cross-sectional data for countries
Description
Small, artificially generated toy data set that comes in a cross-sectional format where the unit of analysis is either country-year or country-year-month. It provides artificial information for five countries (Angola, Benin, France, Rwanda, and the UK) for a time span from 1990 to 1999 to illustrate the use of the package.
Usage
data(toydata)
Format
An object of class "data.frame"
- ccode
ISO3 country code (as character) for the countries in the sample (Angola, Benin, France, Rwanda, and UK)
- year
A value between 1990 and 1999
- month
An abbreviation (MMM) for month (character)
- gpd
A fake value for GDP (randomly generated)
- population
A fake value for population (randomly generated)
References
This data set was artificially created for the overviewR package.
Examples
data(toydata)
head(toydata)