Not %in%
Description
This function does the very opposite of %in%
Usage
x %nin% table
Arguments
x
An object
table
The object you want see if x is not in
Value
A logical vector
See Also
Other subsetters:
%not%()
Subsetting operators
Description
%just% and %not% are subsetting convenience functions
for situations when you would do x[x %in% y] or x[x %nin% y]. See
details for behavior when x is a data frame or matrix.
Usage
x %not% y
x %not% y <- value
x %just% y
x %just% y <- value
## Default S3 method:
x %not% y
## Default S3 method:
x %not% y <- value
## S3 method for class 'data.frame'
x %not% y
## S3 method for class 'data.frame'
x %not% y <- value
## S3 method for class 'matrix'
x %not% y
## S3 method for class 'matrix'
x %not% y <- value
## S3 method for class 'list'
x %not% y
## S3 method for class 'list'
x %not% y <- value
## Default S3 method:
x %just% y
## Default S3 method:
x %just% y <- value
## S3 method for class 'data.frame'
x %just% y
## S3 method for class 'data.frame'
x %just% y <- value
## S3 method for class 'matrix'
x %just% y
## S3 method for class 'matrix'
x %just% y <- value
## S3 method for class 'list'
x %just% y
## S3 method for class 'list'
x %just% y <- value
Arguments
x
Object to subset
y
List of items to include if they are/aren't in x
value
The object(s) to assign to the subsetted x
Details
The behavior of %not% and %just% are different when you're subsetting
data frames or matrices. The subset y in this case is interpreted as
column names or indices.
You can also make assignments to the subset in the same way you could if subsetting with brackets.
Value
All of x that are in y (%just%) or all of x that are not in
y (%not%).
See Also
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Other subsetters:
%nin%()
Examples
x <- 1:5
y <- 3:8
x %just% y # 3 4 5
x %not% y # 1 2
# Assignment works too
x %just% y <- NA # 1 2 NA NA NA
x %not% y <- NA # NA NA 3 4 5
mtcars %just% c("mpg", "qsec", "cyl") # keeps only columns with those names
mtcars %not% 1:5 # drops columns 1 through 5
# Assignment works for data frames as well
mtcars %just% c("mpg", "qsec") <- gscale(mtcars, c("mpg", "qsec"))
mtcars %not% c("mpg", "qsec") <- gscale(mtcars %not% c("mpg", "qsec"))
Add and remove gridlines
Description
These are convenience wrappers for editing ggplot2::theme() 's
panel.grid.major and panel.grid.minor parameters with sensible
defaults.
Usage
add_gridlines(x = TRUE, y = TRUE, minor = TRUE)
add_x_gridlines(minor = TRUE)
add_y_gridlines(minor = TRUE)
drop_gridlines(x = TRUE, y = TRUE, minor.only = FALSE)
drop_x_gridlines(minor.only = FALSE)
drop_y_gridlines(minor.only = FALSE)
Arguments
x
Apply changes to the x axis?
y
Apply changes to the y axis?
minor
Add minor gridlines in addition to major?
minor.only
Remove only the minor gridlines?
Mean-center vectors, data frames, and survey designs
Description
This function is a wrapper around gscale() that is configured
to mean-center variables without affecting the scaling of those variables.
Usage
center(
data = NULL,
vars = NULL,
binary.inputs = "center",
binary.factors = FALSE,
weights = NULL
)
Arguments
data
A data frame or survey design. Only needed if you would like to
rescale multiple variables at once. If x = NULL, all columns will
be rescaled. Otherwise, x should be a vector of variable names. If
x is a numeric vector, this argument is ignored.
vars
If data is a data.frame or similar, you can scale only
select columns by providing a vector column names to this argument.
binary.inputs
Options for binary variables. Default is center;
0/1 keeps original scale; -0.5/0.5 rescales 0 as -0.5 and 1
as 0.5; center subtracts the mean; and full subtracts the
mean and divides by 2 sd.
binary.factors
Coerce two-level factors to numeric and apply scaling functions to them? Default is FALSE.
weights
A vector of weights equal in length to x. If iterating
over a data frame, the weights will need to be equal in length to all the
columns to avoid errors. You may need to remove missing values before using
the weights.
Details
Some more information can be found in the documentation for
gscale()
Value
A transformed version of the data argument.
See Also
standardization, scaling, and centering tools
center_mod(),
gscale(),
scale_mod(),
standardize()
Examples
# Standardize just the "qsec" variable in mtcars
standardize(mtcars, vars = "qsec")
Center variables in fitted regression models
Description
center_mod (previously known as center_lm) takes fitted regression models
and mean-centers the
continuous variables in the model to aid interpretation, especially in
the case of models with interactions. It is a wrapper to
scale_mod .
Usage
center_mod(
model,
binary.inputs = "0/1",
center.response = FALSE,
data = NULL,
apply.weighted.contrasts = getOption("jtools-weighted.contrasts", FALSE),
...
)
Arguments
model
A regression model of type lm, glm, or
svyglm ; others may work as well but have not been
tested.
binary.inputs
Options for binary variables. Default is 0/1;
0/1 keeps original scale; -0.5,0.5 rescales 0 as -0.5 and 1
as 0.5; center subtracts the mean; and full treats them
like other continuous variables.
center.response
Should the response variable also be centered?
Default is FALSE.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
apply.weighted.contrasts
Factor variables cannot be scaled, but you
can set the contrasts such that the intercept in a regression model will
reflect the true mean (assuming all other variables are centered). If set
to TRUE, the argument will apply weighted effects coding to all factors.
This is similar to the R default effects coding, but weights according to
how many observations are at each level. An adapted version of
contr.wec() from the wec package is used to do this. See
that package's documentation and/or Grotenhuis et al. (2016) for more
info.
...
Arguments passed on to gscale() .
Details
This function will mean-center all continuous variables in a
regression model for ease of interpretation, especially for those models
that have
interaction terms. The mean for svyglm objects is calculated using
svymean, so reflects the survey-weighted mean. The weight variables
in svyglm are not centered, nor are they in other lm family
models.
This function re-estimates the model, so for large models one should expect a runtime equal to the first run.
Value
The functions returns a lm or glm object, inheriting
from whichever class was supplied.
Author(s)
Jacob Long jacob.long@sc.edu
References
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40(3), 373-400.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analyses for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
See Also
sim_slopes performs a simple slopes analysis.
interact_plot creates attractive, user-configurable plots of
interaction models.
standardization, scaling, and centering tools
center(),
gscale(),
scale_mod(),
standardize()
Examples
fit <- lm(formula = Murder ~ Income * Illiteracy,
data = as.data.frame(state.x77))
fit_center <- center_mod(fit)
# With weights
fitw <- lm(formula = Murder ~ Income * Illiteracy,
data = as.data.frame(state.x77),
weights = Population)
fitw_center <- center_mod(fitw)
# With svyglm
if (requireNamespace("survey")) {
library(survey)
data(api)
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc =~ fpc)
regmodel <- svyglm(api00 ~ ell * meals, design = dstrat)
regmodel_center <- center_mod(regmodel)
}
Plot simple effects in regression models
Description
effect_plot() plots regression paths. The plotting is done with
ggplot2 rather than base graphics, which some similar functions use.
Usage
effect_plot(
model,
pred,
pred.values = NULL,
centered = "all",
plot.points = FALSE,
interval = FALSE,
data = NULL,
at = NULL,
int.type = c("confidence", "prediction"),
int.width = 0.95,
outcome.scale = "response",
robust = FALSE,
cluster = NULL,
vcov = NULL,
set.offset = 1,
x.label = NULL,
y.label = NULL,
pred.labels = NULL,
main.title = NULL,
colors = "black",
line.colors = colors,
line.thickness = 1.1,
point.size = 1.5,
point.alpha = 0.6,
jitter = 0,
rug = FALSE,
rug.sides = "lb",
force.cat = FALSE,
cat.geom = c("point", "line", "bar"),
cat.interval.geom = c("errorbar", "linerange"),
cat.pred.point.size = 3.5,
partial.residuals = FALSE,
color.class = colors,
facet.by = NULL,
...
)
Arguments
model
A regression model. The function is tested with lm,
glm, svyglm ,
merMod ,
rq , brmsfit,
stanreg models.
Models from other classes may work as well but are not officially
supported. The model should include the interaction of interest.
pred
The name of the predictor variable involved
in the interaction. This can be a bare name or string. Note that it
is evaluated using rlang, so programmers can use the !! syntax
to pass variables instead of the verbatim names.
pred.values
Values of pred to use instead of the equi-spaced
series by default (for continuous variables) or all unique values (for
non-continuous variables).
centered
A vector of quoted variable names that are to be
mean-centered. If "all", all non-focal predictors are centered. You
may instead pass a character vector of variables to center. User can
also use "none" to base all predictions on variables set at 0.
The response variable, pred, weights, and offset variables are never
centered.
plot.points
Logical. If TRUE, plots the actual data points as
a scatterplot on top of the interaction lines. The color of the dots will
be based on their moderator value.
interval
Logical. If TRUE, plots confidence/prediction
intervals around the line using geom_ribbon .
data
Optional, default is NULL. You may provide the data used to
fit the model. This can be a better way to get mean values for centering
and can be crucial for models with variable transformations in the formula
(e.g., log(x)) or polynomial terms (e.g., poly(x, 2)). You will
see a warning if the function detects problems that would likely be
solved by providing the data with this argument and the function will
attempt to retrieve the original data from the global environment.
at
If you want to manually set the values of other variables in the model, do so by providing a named list where the names are the variables and the list values are vectors of the values. This can be useful especially when you are exploring interactions or other conditional predictions.
int.type
Type of interval to plot. Options are "confidence" or "prediction". Default is confidence interval.
int.width
How large should the interval be, relative to the standard error? The default, .95, corresponds to roughly 1.96 standard errors and a .05 alpha level for values outside the range. In other words, for a confidence interval, .95 is analogous to a 95% confidence interval.
outcome.scale
For nonlinear models (i.e., GLMs), should the outcome
variable be plotted on the link scale (e.g., log odds for logit models) or
the original scale (e.g., predicted probabilities for logit models)? The
default is "response", which is the original scale. For the link
scale, which will show straight lines rather than curves, use
"link".
robust
Should robust standard errors be used to find confidence
intervals for supported models? Default is FALSE, but you should specify
the type of sandwich standard errors if you'd like to use them (i.e.,
"HC0", "HC1", and so on). If TRUE, defaults to "HC3" standard
errors.
cluster
For clustered standard errors, provide the column name of the cluster variable in the input data frame (as a string). Alternately, provide a vector of clusters.
vcov
Optional. You may supply the variance-covariance matrix of the coefficients yourself. This is useful if you are using some method for robust standard error calculation not supported by the sandwich package.
set.offset
For models with an offset (e.g., Poisson models), sets an offset for the predicted values. All predicted values will have the same offset. By default, this is set to 1, which makes the predicted values a proportion. See details for more about offset support.
x.label
A character object specifying the desired x-axis label. If
NULL, the variable name is used.
y.label
A character object specifying the desired x-axis label. If
NULL, the variable name is used.
pred.labels
A character vector of labels for categorical predictors.
If NULL, the default, the factor labels are used.
main.title
A character object that will be used as an overall title
for the plot. If NULL, no main title is used.
colors
See jtools_colors for details on the types of arguments
accepted. Default is "black". This affects the coloration of the line
as well as confidence intervals and points unless you give a different
argument to line.color.
line.colors
See jtools_colors for details on the types of arguments
accepted. Default is colors. This affects the coloration of the line
as well as confidence intervals, but not the points.
line.thickness
How thick should the plotted lines be? Default is 1.1; ggplot's default is 1.
point.size
What size should be used for observed data when
plot.points is TRUE? Default is 1.5.
point.alpha
What should the alpha aesthetic for plotted points of
observed data be? Default is 0.6, and it can range from 0 (transparent) to
1 (opaque).
jitter
How much should plot.points observed values be "jittered"
via ggplot2::position_jitter() ? When there are many points near each
other, jittering moves them a small amount to keep them from
totally overlapping. In some cases, though, it can add confusion since
it may make points appear to be outside the boundaries of observed
values or cause other visual issues. Default is 0, but try various
small values (e.g., 0.1) and increase as needed if your points are
overlapping too much. If the argument is a vector with two values,
then the first is assumed to be the jitter for width and the second
for the height.
rug
Show a rug plot in the margins? This uses ggplot2::geom_rug()
to show the distribution of the predictor (top/bottom) and/or
response variable (left/right) in the original data. Default is
FALSE.
rug.sides
On which sides should rug plots appear? Default is "lb", meaning both left and bottom. "t" and/or "b" show the distribution of the predictor while "l" and/or "r" show the distribution of the response.
force.cat
Force the continuous pred to be treated as categorical?
default is FALSE, but this can be useful for things like dummy 0/1
variables.
cat.geom
If pred is categorical (or force.cat is TRUE),
what type of plot should this be? There are several options
here since the best way to visualize categorical interactions varies by
context. Here are the options:
-
"point": The default. Simply plot the point estimates. You may want to usepoint.shape = TRUEwith this and you should also considerinterval = TRUEto visualize uncertainty. -
"line": This connects observations across levels of thepredvariable with a line. This is a good option when thepredvariable is ordinal (ordered). You may still considerpoint.shape = TRUEandinterval = TRUEis still a good idea. -
"bar": A bar chart. Some call this a "dynamite plot." Many applied researchers advise against this type of plot because it does not represent the distribution of the observed data or the uncertainty of the predictions very well. It is best to at least use theinterval = TRUEargument with this geom.
cat.interval.geom
For categorical by categorical interactions.
One of "errorbar" or "linerange". If the former,
ggplot2::geom_errorbar() is used. If the latter,
ggplot2::geom_linerange() is used.
cat.pred.point.size
(for categorical pred)
If TRUE and geom is "point" or "line",
sets the size of the predicted points. Default is 3.5.
Note the distinction from point.size, which refers to the
observed data points.
partial.residuals
Instead of plotting the observed data, you may plot
the partial residuals (controlling for the effects of variables besides
pred).
color.class
Deprecated. Now known as colors.
facet.by
A variable in the data by which you want to plot the
effects separately. This will cause the plot to include multiple panels,
basically a separate plot for each unique value of the variable in
facet.by. This will be most useful when plotting effects from multilevel
models (e.g., as fit by lme4's models) with a random slope for the
pred variable. You should generally only want to use this if you expect
the different panels to look meaningfully different. Default is NULL.
...
extra arguments passed to make_predictions()
Details
This function provides a means for plotting effects for the
purpose of exploring regression estimates. You must have the
package ggplot2 installed to benefit from these plotting functions.
By default, all numeric predictors other than the one specified in the
pred argument are mean-centered, which usually produces more
intuitive plots. This only affects the y-axis in linear models, but
may be especially important/influential in non-linear/generalized linear
models.
This function supports nonlinear and generalized linear models and by
default will plot them on
their original scale (outcome.scale = "response").
While mixed effects models from lme4 are supported, only the fixed
effects are plotted. lme4 does not provide confidence intervals,
so they are not supported with this function either.
Note: to use transformed predictors, e.g., log(x), or polynomials,
e.g., poly(x, 2), provide the raw variable name (x) to the pred =
argument. You will need to input the data frame used to fit the model with
the data = argument.
Value
The functions returns a ggplot object, which can be treated
like
a user-created plot and expanded upon as such.
Author(s)
Jacob Long jacob.long@sc.edu
See Also
interact_plot from the interactions package plots interaction
effects,
producing plots like this function but with separate lines for different
levels of a moderator. cat_plot from interactions does the same
for categorical by categorical interactions.
Examples
# Using a fitted lm model
states <- as.data.frame(state.x77)
states$HSGrad <- states$`HS Grad`
fit <- lm(Income ~ HSGrad + Murder,
data = states)
effect_plot(model = fit, pred = Murder)
# Using polynomial predictor, plus intervals
fit <- lm(accel ~ poly(mag,3) + dist, data = attenu)
effect_plot(fit, pred = mag, interval = TRUE,
int.type = "confidence", int.width = .8, data = attenu) # note data arg.
# With svyglm
if (requireNamespace("survey")) {
library(survey)
data(api)
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc = ~fpc)
regmodel <- svyglm(api00 ~ ell + meals, design = dstrat)
effect_plot(regmodel, pred = ell, interval = TRUE)
}
# With lme4
## Not run:
library(lme4)
data(VerbAgg)
mv <- glmer(r2 ~ Anger + mode + (1 | item), data = VerbAgg,
family = binomial,
control = glmerControl("bobyqa"))
effect_plot(mv, pred = Anger)
## End(Not run)
Export regression summaries to tables
Description
This function allows users to use the features of
summ() (e.g., standardization, robust standard errors)
in the context of shareable HTML, LaTeX, and
Microsoft Word tables. It relies heavily on huxtable::huxreg()
to do the table formatting. This is particularly useful for putting
the results of multiple models into a single table.
Usage
export_summs(
...,
error_format = "({std.error})",
error_pos = c("below", "right", "same"),
ci_level = 0.95,
statistics = NULL,
model.names = NULL,
coefs = NULL,
to.file = NULL,
file.name = NULL
)
Arguments
...
At minimum, a regression object(s). See details for more arguments.
error_format
Which of standard error, confidence intervals, test
statistics, or p values should be used to express uncertainty of estimates
for regression coefficients? See details for more info.
Default: "({std.error})"
error_pos
Where should the error statistic defined in
error_style be placed relative to the coefficient estimate?
Default: "below"
ci_level
If reporting confidence intervals, what should the
confidence level be? By default, it is .95 if
confidence intervals are requested in error_format.
statistics
Which model summary statistics should be included?
See huxreg for more on usage. The default
for this function depends on the model type. See details for more on
the defaults by model type.
model.names
If you want to give your model(s) names at the top of each column, provide them here as a character vector. Otherwise, they will just be labeled by number. Default: NULL
coefs
If you want to include only a subset of the coefficients in the table, specify them here in a character vector. If you want the table to show different names for the coefficients, give a named vector where the names are the preferred coefficient names. See details for more.
to.file
Export the table to a Microsoft Word, PDF, or HTML document?
This functionality relies on huxtable's quick_ functions
(huxtable::quick_docx() , huxtable::quick_pdf() ,
huxtable::quick_html() , huxtable::quick_xlsx() ). Acceptable arguments
are "Word" or "docx" (equivalent), "pdf", "html", or "xlsx". All are
case insensitive. Default is NULL, meaning the table is not saved.
file.name
File name with (optionally) file path to save the
file. Ignored if to.file is FALSE. Default: NULL
Details
There are many optional parameters not documented above. Any
argument that you would want to pass to summ() , for instance,
will be used. Of particular interest may be the robust and scale
arguments. Note that some summ arguments may not have any bearing
on the table output.
The default model summary statistics reporting follows this logic:
summ.lm =
c(N = "nobs", R2 = "r.squared"),summ.glm =
c(N = "nobs", AIC = "AIC", BIC = "BIC", `Pseudo R2` = "pseudo.r.squared"),summ.svyglm =
c(N = "nobs", R2 = "r.squared"),summ.merMod =
c(N = "nobs", AIC = "AIC", BIC = "BIC", `R2 (fixed)` = "r.squared.fixed", `R2 (total)` = "r.squared")summ.rq =
c(N = "nobs", tau = "tau", R1 = "r.1", AIC = "AIC", BIC = "BIC")
Be sure to look at the summ() documentation for more on the calculation
of these and other statistics, especially for mixed models.
If you set statistics = "all", then the statistics argument
passed to huxreg will be NULL, which reports whichever
model statistics are available via glance. If you want no
model summary statistics, set the argument to character(0).
You have a few options for the error_format argument.
You can include anything returned by broom::tidy()
(see also tidy.summ() ). For the most part, you will
be interested in std.error (standard error), statistic
(test statistic, e.g. t-value or z-value), p.value, or
conf.high and conf.low, which correspond to the
upper and lower bounds of the confidence interval for the estimate.
Note that the default ci_level argument is .95, but you
can alter that as desired.
To format the error statistics, simply put the statistics desired in
curly braces wherever you want them in a character string. For example,
if you want the standard error in parentheses, the argument would be
"({std.error})", which is the default. Some other ideas:
-
"({statistic})", which gives you the test statistic in parentheses. -
"({statistic}, p = {p.value})", which gives the test statistic followed by a "p =" p value all in parentheses. Note that you'll have to pay special attention to rounding if you do this to keep cells sufficiently narrow. -
"[{conf.low}, {conf.high}]", which gives the confidence interval in the standard bracket notation. You could also explicitly write the confidence level, e.g.,"CI [{conf.low}, {conf.high}]".
For coefs, the argument is slightly different than what is default
in huxreg. If you provide a named vector of coefficients, then
the table will refer to the selected coefficients by the names of the
vector rather than the coefficient names. For instance, if I want to
include only the coefficients for the hp and mpg but have
the table refer to them as "Horsepower" and "Miles/gallon", I'd provide
the argument like this:
c("Horsepower" = "hp", "Miles/gallon" = "mpg")
You can also pass any argument accepted by the
huxtable::huxreg() function. A few that are likely to be
oft-used are documented above, but visit huxreg's documentation
for more info.
For info on converting the huxtable::huxtable() object to
HTML or LaTeX, see huxtable's documentation.
Value
A huxtable.
See Also
Examples
states <- as.data.frame(state.x77)
fit1 <- lm(Income ~ Frost, data = states)
fit2 <- lm(Income ~ Frost + Illiteracy, data = states)
fit3 <- lm(Income ~ Frost + Illiteracy + Murder, data = states)
if (requireNamespace("huxtable")) {
# Export all 3 regressions with "Model #" labels,
# standardized coefficients, and robust standard errors
export_summs(fit1, fit2, fit3,
model.names = c("Model 1","Model 2","Model 3"),
coefs = c("Frost Days" = "Frost",
"% Illiterate" = "Illiteracy",
"Murder Rate" = "Murder"),
scale = TRUE, robust = TRUE)
}
Get colors for plotting functions
Description
This is a helper function that provides hex color codes for
jtools, interactions, and perhaps other packages.
Usage
get_colors(colors, num.colors = 1, reverse = FALSE, gradient = FALSE)
Arguments
colors
The name of the desired color class or a vector of colors. See details of jtools_colors.
num.colors
How many colors should be returned? Default is 1.
reverse
Should the colors be returned in reverse order, compared to normal? Default is FALSE.
gradient
Return endpoints for a gradient? Default is FALSE. If TRUE,
num.colors is ignored.
Retrieve formulas from model objects
Description
This function is primarily an internal helper function in jtools and
related packages to standardize the different types of formula objects used
by different types of models.
Usage
get_formula(model, ...)
## Default S3 method:
get_formula(model, ...)
## S3 method for class 'brmsfit'
get_formula(model, resp = NULL, dpar = NULL, ...)
## S3 method for class 'panelmodel'
get_formula(model, ...)
Arguments
model
The fitted model object.
...
Ignored.
resp
For brmsfit objects, the response variable for which
the formula is desired. brmsfit objects may have multiple formulas, so
this selects a particular one. If NULL, the first formula is chosen
(unless dpar is specified).
dpar
For brmsfit objects, the distributional variable for which
the formula is desired. If NULL, no distributional parameter is used.
If there are multiple responses with distributional parameters, then
resp should be specified or else the first formula will be used by
default.
Value
A formula object.
Examples
data(mtcars)
fit <- lm(mpg ~ cyl, data = mtcars)
get_formula(fit)
Utility functions for generating model predictions
Description
These functions get information and data from regression models.
Usage
get_offset_name(model)
get_weights(model, data)
get_data(model, formula = NULL, warn = TRUE, ...)
get_response_name(model, ...)
Arguments
model
The model (e.g., lm, glm, merMod, svyglm)
data
For get_weights(), the data used to fit the model.
formula
The formula for model, if desired. Otherwise get_formula()
is called.
warn
For get_data(), should there be a warning when model.frame()
won't work because of variable transformations? Default is TRUE but this
may not be desired when get_data() is used inside of another function or
used multiple times.
...
Arguments passed to get_formula()
Value
-
get_data(): The data used to fit the model. -
get_response_name(): The name of the response variable. -
get_offset_name(): The name of the offset variable. -
get_weights(): A list withweights_name, the name of the weighting variable, andweights, the weights themselves (or all 1 when there are no weights).
Calculate robust standard errors and produce coefficient tables
Description
This function wraps around several sandwich and lmtest functions to calculate robust standard errors and returns them in a useful format.
Usage
get_robust_se(
model,
type = "HC3",
cluster = NULL,
data = model.frame(model),
vcov = NULL
)
Arguments
model
A regression model, preferably of class lm or glm
type
One of "HC3", "const", "HC", "HC0", "HC1",
"HC2", "HC4", "HC4m", "HC5". See sandwich::vcovHC() for some
more details on these choices. Note that some of these do not work for
clustered standard errors (see sandwich::vcovCL() ).
cluster
If you want clustered standard errors, either a string naming
the column in data that represents the clusters or a vector of clusters
that is the same length as the number of rows in data.
data
The data used to fit the model. Default is to just get the
model.frame from model.
vcov
You may provide the variance-covariance matrix yourself and this function will just calculate standard errors, etc. based on that. Default is NULL.
Value
A list with the following:
-
coefs: a coefficient table with the estimates, standard errors, t-statistics, and p-values fromlmtest. -
ses: The standard errors fromcoefs. -
ts: The t-statistics fromcoefs. -
ps: The p-values fromcoefs. -
type: The argument torobust. -
use_cluster:TRUEorFALSEindicator of whether clusters were used. -
cluster: The clusters or name of cluster variable used, if any. -
vcov: The robust variance-covariance matrix.
Scale and/or center data, including survey designs
Description
gscale standardizes variables by dividing them by 2 standard
deviations and mean-centering them by default. It contains options for
handling binary variables separately. gscale() is a fork of
rescale from the arm package—the key feature
difference is that gscale() will perform the same functions for
variables in svydesign objects. gscale() is
also more user-friendly in that it is more flexible in how it accepts input.
Usage
gscale(
data = NULL,
vars = NULL,
binary.inputs = "center",
binary.factors = FALSE,
n.sd = 2,
center.only = FALSE,
scale.only = FALSE,
weights = NULL,
apply.weighted.contrasts = getOption("jtools-weighted.contrasts", FALSE),
x = NULL,
messages = FALSE
)
Arguments
data
A data frame or survey design. Only needed if you would like to
rescale multiple variables at once. If x = NULL, all columns will
be rescaled. Otherwise, x should be a vector of variable names. If
x is a numeric vector, this argument is ignored.
vars
If data is a data.frame or similar, you can scale only
select columns by providing a vector column names to this argument.
binary.inputs
Options for binary variables. Default is center;
0/1 keeps original scale; -0.5/0.5 rescales 0 as -0.5 and 1
as 0.5; center subtracts the mean; and full subtracts the
mean and divides by 2 sd.
binary.factors
Coerce two-level factors to numeric and apply scaling functions to them? Default is FALSE.
n.sd
By how many standard deviations should the variables be divided
by? Default for gscale is 2, like arm's rescale.
1 is the more typical standardization scheme.
center.only
A logical value indicating whether you would like to mean -center the values, but not scale them.
scale.only
A logical value indicating whether you would like to scale the values, but not mean-center them.
weights
A vector of weights equal in length to x. If iterating
over a data frame, the weights will need to be equal in length to all the
columns to avoid errors. You may need to remove missing values before using
the weights.
apply.weighted.contrasts
Factor variables cannot be scaled, but you
can set the contrasts such that the intercept in a regression model will
reflect the true mean (assuming all other variables are centered). If set
to TRUE, the argument will apply weighted effects coding to all factors.
This is similar to the R default effects coding, but weights according to
how many observations are at each level. An adapted version of
contr.wec() from the wec package is used to do this. See
that package's documentation and/or Grotenhuis et al. (2016) for more
info.
x
Deprecated. Pass numeric vectors to data. Pass vectors of column
names to vars.
messages
Print messages when variables are not processed due to being non-numeric or all missing? Default is FALSE.
Details
This function is adapted from the rescale function of
the arm package. It is named gscale() after the
popularizer of this scaling method, Andrew Gelman. By default, it
works just like rescale. But it contains many additional options and
can also accept multiple types of input without breaking a sweat.
Only numeric variables are altered when in a data.frame or survey design. Character variables, factors, etc. are skipped.
For those dealing with survey data, if you provide a survey.design
object you can rest assured that the mean-centering and scaling is performed
with help from the svymean() and
svyvar() functions, respectively. It was among the
primary motivations for creating this function. gscale() will not
center or scale the weights variables defined in the survey design unless
the user specifically requests them in the x = argument.
Author(s)
Jacob Long jacob.long@sc.edu
References
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27, 2865–2873. http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf
Grotenhuis, M. te, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: Advantages of weighted effect coding in observational studies. International Journal of Public Health, 62, 163–167. https://doi.org/10.1007/s00038-016-0901-1 ( open access)
See Also
j_summ is a replacement for the summary function for
regression models. On request, it will center and/or standardize variables
before printing its output.
standardization, scaling, and centering tools
center(),
center_mod(),
scale_mod(),
standardize()
Examples
x <- rnorm(10, 2, 1)
x2 <- rbinom(10, 1, .5)
# Basic use
gscale(x)
# Normal standardization
gscale(x, n.sd = 1)
# Scale only
gscale(x, scale.only = TRUE)
# Center only
gscale(x, center.only = TRUE)
# Binary inputs
gscale(x2, binary.inputs = "0/1")
gscale(x2, binary.inputs = "full") # treats it like a continous var
gscale(x2, binary.inputs = "-0.5/0.5") # keep scale, center at zero
gscale(x2, binary.inputs = "center") # mean center it
# Data frame as input
# loops through each numeric column
gscale(data = mtcars, binary.inputs = "-0.5/0.5")
# Specified vars in data frame
gscale(mtcars, vars = c("hp", "wt", "vs"), binary.inputs = "center")
# Weighted inputs
wts <- runif(10, 0, 1)
gscale(x, weights = wts)
# If using a weights column of data frame, give its name
mtcars$weights <- runif(32, 0, 1)
gscale(mtcars, weights = weights) # will skip over mtcars$weights
# If using a weights column of data frame, can still select variables
gscale(mtcars, vars = c("hp", "wt", "vs"), weights = weights)
# Survey designs
if (requireNamespace("survey")) {
library(survey)
data(api)
## Create survey design object
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc=~fpc)
# Creating test binary variable
dstrat$variables$binary <- rbinom(200, 1, 0.5)
gscale(data = dstrat, binary.inputs = "-0.5/0.5")
gscale(data = dstrat, vars = c("api00","meals","binary"),
binary.inputs = "-0.5/0.5")
}
Deprecated interaction functions
Description
These functions are now part of the interactions
package.
Usage
interact_plot(...)
cat_plot(...)
sim_slopes(...)
johnson_neyman(...)
probe_interaction(...)
Arguments
...
arguments are ignored
Color palettes in jtools functions
Description
jtools combines several options into the colors
argument in plotting functions.
Details
The argument to colors in functions like effect_plot,
plot_coefs, and others is very flexible but may also
cause confusion.
If you provide an argument of length 1, it is assumed that you are naming
a palette. jtools provides 6 color palettes design for qualitative data.
4 of the 6 are based on Paul Tol's suggestions (see references) and are
meant to both optimize your ability to quickly differentiate the colors
and to be distinguishable to colorblind people.
These are called
"Qual1", "Qual2", "Qual3", "CUD", "CUD Bright", and "Rainbow".
Each of the "Qual" schemes comes from Paul Tol.
"Rainbow" is Paul Tol's compromise rainbow color scheme that is fairly
differentiable for colorblind people and when rendered in grayscale.
"CUD Bright" is a brightened and reordered version of Okabe and Ito's
suggestions for 'Color Universal Design' while "CUD" is their exact
scheme (see references). "CUD Bright" is the default for qualitative
scales in jtools functions.
You may also provide any color palette supported by RColorBrewer.
See all of those options at RColorBrewer::brewer.pal() 's documentation.
If you provide one of RColorBrewer's sequential palettes, like "Blues",
jtools automatically requests one more color than needed from
brewer.pal and then drops the lightest color. My experience is that
those scales tend to give one color that is too light to easily
differentiate against a white background.
For gradients, you can use any of the RColorBrewer sequential palette
names and get comparable results on a continuous scale. There are also some
jtools-specific gradient schemes: "blue", "blue2", "green",
"red", "purple", "seagreen".
If you want something a little non-standard, I'd suggest taking a look
at "blue2" or "seagreen".
Lastly, you may provide colors by name. This must be a vector of the
same length as whatever it is the colors will correspond to. The format
must be one understood by ggplot2's manual scale functions. This
basically means it needs to be in hex format (e.g., "#000000") or
one of the many names R understands (e.g., "red"; use colors() to
see all of those options).
References
Paul Tol's site is what is used to derive 4 of the 6 jtools-specific
qualitative palettes: https://personal.sron.nl/~pault/
Okabe and Ito's palette inspired "CUD Bright", though "CUD Bright" is not exactly the same. "CUD" is the same. See https://web.archive.org/web/20190216090108/jfly.iam.u-tokyo.ac.jp/color/ for more.
knitr methods for summ
Description
There's no reason for end users to utilize these functions, but CRAN requires it to be documented.
Usage
## S3 method for class 'summ.lm'
knit_print(x, options = NULL, ...)
## S3 method for class 'summ.glm'
knit_print(x, options = NULL, ...)
## S3 method for class 'summ.svyglm'
knit_print(x, options = NULL, ...)
## S3 method for class 'summ.merMod'
knit_print(x, options = NULL, ...)
## S3 method for class 'summ.rq'
knit_print(x, options = NULL, ...)
Arguments
x
The summ object
options
Chunk options.
...
Ignored.
Make new data for generating predicted data from regression models.
Description
This is a convenience function that helps automate the process
of generating predicted data from regression model from a predictor(s). It
is designed to give you the data frame for the predict method's newdata
argument.
Usage
make_new_data(
model,
pred,
pred.values = NULL,
at = NULL,
data = NULL,
center = TRUE,
set.offset = NULL,
num.preds = 100,
...
)
Arguments
model
The model (e.g., lm, glm, merMod, svyglm)
pred
The name of the focal predictor as a string. This is the variable for which, if you are plotting, you'd likely have along the x-axis (with the dependent variable as the y-axis).
pred.values
The values of pred you want to include. Default is NULL,
which means a sequence of equi-spaced values over the range of a numeric
predictor or each level of a non-numeric predictor.
at
If you want to manually set the values of other variables in the model, do so by providing a named list where the names are the variables and the list values are vectors of the values. This can be useful especially when you are exploring interactions or other conditional predictions.
data
The data frame used in fitting the model. Default is NULL, in
which case the data will be retrieved via model.frame or, if there are
variable transformations in the formula, by looking in the environment
for the data.
center
Set numeric covariates to their mean? Default is TRUE. You
may also just provide a vector of names (as strings) of covariates to
center. Note that for svyglm models, the survey-weighted means are used.
For models with weights, these are weighted means.
set.offset
If the model has an offset, the value to use for the offset variable. Default is NULL, in which case the median of the offset variable is used.
num.preds
The number of predictions to generate. Default is 100.
Ignored if pred.values is not NULL.
...
Extra arguments passed to get_formula()
Details
Please bear in mind that this does not generate the predictions. You will
need to do that with a predict function for your model or another
interface, such as the prediction package's titular function.
Value
A data frame.
Examples
fit <- lm(Income ~ Frost + Illiteracy + Murder, data = as.data.frame(state.x77))
# Basic use
new_data <- make_new_data(fit, pred = "Frost")
# Set covariate to specific value
new_data <- make_new_data(fit, pred = "Frost", at = list(Murder = 5))
# Set covariate to several specific values
new_data <- make_new_data(fit, pred = "Frost", at = list(Murder = c(5, 10, 15)))
Generate predicted data for plotting results of regression models
Description
This is an alternate interface to the underlying tools that
make up effect_plot() as well as interactions::interact_plot() and
interactions::cat_plot() from the interactions package.
make_predictions() creates the data to be plotted and adds information
to the original data to make it more amenable for plotting with the
predicted data.
Usage
make_predictions(model, ...)
## Default S3 method:
make_predictions(
model,
pred,
pred.values = NULL,
at = NULL,
data = NULL,
center = TRUE,
interval = TRUE,
int.type = c("confidence", "prediction"),
int.width = 0.95,
outcome.scale = "response",
robust = FALSE,
cluster = NULL,
vcov = NULL,
set.offset = NULL,
new_data = NULL,
return.orig.data = FALSE,
partial.residuals = FALSE,
...
)
Arguments
model
The model (e.g., lm, glm, merMod, svyglm)
...
Ignored.
pred
The name of the focal predictor as a string. This is the variable for which, if you are plotting, you'd likely have along the x-axis (with the dependent variable as the y-axis).
pred.values
The values of pred you want to include. Default is NULL,
which means a sequence of equi-spaced values over the range of a numeric
predictor or each level of a non-numeric predictor.
at
If you want to manually set the values of other variables in the model, do so by providing a named list where the names are the variables and the list values are vectors of the values. This can be useful especially when you are exploring interactions or other conditional predictions.
data
Optional, default is NULL. You may provide the data used to
fit the model. This can be a better way to get mean values for centering
and can be crucial for models with variable transformations in the formula
(e.g., log(x)) or polynomial terms (e.g., poly(x, 2)). You will
see a warning if the function detects problems that would likely be
solved by providing the data with this argument and the function will
attempt to retrieve the original data from the global environment.
center
Set numeric covariates to their mean? Default is TRUE. You
may also just provide a vector of names (as strings) of covariates to
center. Note that for svyglm models, the survey-weighted means are used.
For models with weights, these are weighted means.
interval
Logical. If TRUE, plots confidence/prediction
intervals around the line using geom_ribbon .
int.type
Type of interval to plot. Options are "confidence" or "prediction". Default is confidence interval.
int.width
How large should the interval be, relative to the standard error? The default, .95, corresponds to roughly 1.96 standard errors and a .05 alpha level for values outside the range. In other words, for a confidence interval, .95 is analogous to a 95% confidence interval.
outcome.scale
For nonlinear models (i.e., GLMs), should the outcome
variable be plotted on the link scale (e.g., log odds for logit models) or
the original scale (e.g., predicted probabilities for logit models)? The
default is "response", which is the original scale. For the link
scale, which will show straight lines rather than curves, use
"link".
robust
Should robust standard errors be used to find confidence
intervals for supported models? Default is FALSE, but you should specify
the type of sandwich standard errors if you'd like to use them (i.e.,
"HC0", "HC1", and so on). If TRUE, defaults to "HC3" standard
errors.
cluster
For clustered standard errors, provide the column name of the cluster variable in the input data frame (as a string). Alternately, provide a vector of clusters.
vcov
Optional. You may supply the variance-covariance matrix of the coefficients yourself. This is useful if you are using some method for robust standard error calculation not supported by the sandwich package.
set.offset
For models with an offset (e.g., Poisson models), sets an offset for the predicted values. All predicted values will have the same offset. By default, this is set to 1, which makes the predicted values a proportion. See details for more about offset support.
new_data
If you would prefer to generate your own hypothetical
(or not hypothetical) data rather than have the function make a call to
make_new_data() , you can provide it.
return.orig.data
Instead of returning a just the predicted data frame, should the original data be returned as well? If so, then a list will be return with both the predicted data (as the first element) and the original data (as the second element). Default is FALSE.
partial.residuals
If return.orig.data is TRUE, should the observed
dependent variable be replaced with the partial residual? This makes a
call to partialize() , where you can find more details.
Print attractive data frames in the console
Description
This function takes data frame input and prints to the console as an ASCII/markdown table for better readability.
Usage
md_table(
x,
format = getOption("md_table_format", "grid"),
digits = getOption("jtools-digits", 2),
sig.digits = TRUE,
row.names = rownames(x),
col.names = colnames(x),
align = NULL
)
Arguments
x
A data frame or matrix.
format
The style, which can be one of the following: "multiline", "grid", "simple" (also "pandoc"), "rmarkdown" (also "markdown"). Default: "markdown"
digits
How many digits to print for numbers. Default: 2
sig.digits
Should each number be printed with digits number of
digits or only when there are at least that many significant digits? Default
is TRUE, meaning only print digits number of significant digits.
row.names
if FALSE, row names are suppressed. A character vector of row names can also be specified here. By default, row names are included if rownames(t) is neither NULL nor identical to 1:nrow(x).
col.names
a character vector of column names to be used in the table
align
Column alignment: a character vector consisting of ‘'l'’ (left), ‘'c'’ (center) and/or ‘'r'’ (right). By default or if ‘align = NULL’, numeric columns are right-aligned, and other columns are left-aligned.
Data about movies
Description
A dataset containing information about films, how popular they were, and the extent to which they feature women.
Usage
movies
Format
A data frame with 841 rows and 24 variables:
- title
The movie's title
- year
The year of the movie's US theatrical release
- release_date
The exact date of the movie's US theatrical release
- runtime
The length of the movie in hours
- genre5
The movie's primary genre per IMDB, fit into one of 5 broad categories
- genre_detailed
The verbatim genre description per IMDB
- rated
The movie's MPA rating (G, PG, PG-13, R, or NC-17) as an ordered factor
- director
The name of the movie's director(s)
- writer
The name of the movie's screenwriter(s)
- actors
A comma-separated string of leading actors in the film
- language
The movie's language(s), per IMDB
- country
The country(ies) in which the movie was produced
- metascore
The movie's score on MetaCritic, ranging from 0 to 100
- imdb_rating
The movie's rating on IMDB, ranging from 0 to 10
- imdb_votes
The number of users who submitted a rating on IMDB
- imdb_id
The unique identifier for the movie at IMDB
- studio
The studio(s) who produced the movie
- bechdel_binary
A logical indicating whether the movie passed the Bechdel test
- bechdel_ordinal
A more granular measure of the bechdel test, indicating not just whether the movie passed or failed but how close it got to passing if it did fail
- us_gross
The movie's US gross in 2013 US dollars
- int_gross
The movie's international gross in 2013 US dollars
- budget
The movie's budget in 2013 US dollars
- men_lines
The proportion of spoken lines that were spoken by male characters
- lines_data
The raw data used to calculate
men_lines; see Source for more information
Source
These data are aggregated from several sources. Metadata is gathered from
IMDB. Other information, particularly about the lines, is collected from
The Pudding. The data
regarding the Bechdel Test, as well as about finances, comes from
FiveThirtyEight and its associated R package (fivethirtyeight and its
dataset, bechdel).
Numbering printing with signed zeroes and trailing zeroes
Description
This function will print exactly the amount of digits requested
as well as signed zeroes when appropriate (e.g, -0.00).
Usage
num_print(x, digits = getOption("jtools-digits", 2), format = "f")
Arguments
x
The number(s) to print
digits
Number of digits past the decimal to print
format
equal to "d" (for integers),
"f", "e", "E", "g", "G", "fg" (for reals). Default is "f"
Adjust observed data for partial residuals plots
Description
This function is designed to facilitate the creation of partial residual plots, in which you can plot observed data alongside model predictions. The difference is instead of the actual observed data, the outcome variable is adjusted for the effects of the covariates.
Usage
partialize(model, ...)
## Default S3 method:
partialize(
model,
vars = NULL,
data = NULL,
at = NULL,
center = TRUE,
scale = c("response", "link"),
set.offset = 1,
...
)
Arguments
model
A regression model.
...
Ignored.
vars
The variable(s) to not adjust for, as a string (or vector of
strings). If I want to show the effect of x adjusting for the effect of
z, then I would make "x" the vars argument.
data
Optionally, provide the data used to fit the model (or some other data frame with the same variables). Otherwise, it will be retrieved from the model or the global environment.
at
If you want to manually set the values of other variables in the model, do so by providing a named list where the names are the variables and the list values are vectors of the values. This can be useful especially when you are exploring interactions or other conditional predictions.
center
Set numeric covariates to their mean? Default is TRUE. You
may also just provide a vector of names (as strings) of covariates to
center. Note that for svyglm models, the survey-weighted means are used.
For models with weights, these are weighted means.
scale
For GLMs, should the outcome variable be returned on the link
scale or response scale? Default is "response".
set.offset
For models with an offset (e.g., Poisson models), sets an offset for the predicted values. All predicted values will have the same offset. By default, this is set to 1, which makes the predicted values a proportion. See details for more about offset support.
Details
The main use for working with partial residuals rather than the observed values is to explore patterns in the model fit with respect to one or more variables while "controlling out" the effects of others. Plotting a predicted line along with observed data may make a very well-fitting model look as if it is a poor fit if a lot of variation is accounted for by variables other than the one on the x-axis.
I advise consulting Fox and Weisberg (available free) for more details
on what partial residuals are. This function is designed to produce
data in a similar format to effects::Effect() when that function has
residuals set to TRUE and is plotted. I wanted a more modular function
to produce the data separately. To be clear, the developers of the effects
package have nothing to do with this function; 'partialize“ is merely
designed to replicate some of that functionality.
Value
data plus the residualized outcome variable.
References
Fox, J., & Weisberg, S. (2018). Visualizing fit and lack of fit in complex regression models with predictor effect plots and partial residuals. Journal of Statistical Software, 87(9), 1–27. https://doi.org/10.18637/jss.v087.i09
Test whether sampling weights are needed
Description
Use the test proposed in Pfeffermann and Sverchkov (1999) to check whether a regression model is specified correctly without weights.
Usage
pf_sv_test(
model,
data = NULL,
weights,
sims = 1000,
digits = getOption("jtools-digits", default = 3)
)
Arguments
model
The fitted model, without weights
data
The data frame with the data fed to the fitted model and the weights
weights
The name of the weights column in model's data frame
or a vector of weights equal in length to the number of observations
included in model.
sims
The number of bootstrap simulations to use in estimating the variance of the residual correlation. Default is 1000, but for publications or when computing power/time is sufficient, a higher number is better.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 3. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired number.
Details
This is a test described by Pfeffermann and Sverchkov (1999) that is designed to help analysts decide whether they need to use sample weights in their regressions to avoid biased parameter estimation.
It first checks the correlation of the residuals of the model with the weights. It then uses bootstrapping to estimate the variance of the correlation, ending with a t-test of whether the correlation differs from zero. This is done for the squared residuals and cubed residuals as well. If anyone of them are statistically significant (at whatever level you feel appropriate), it is best to do a weighted regression. Note that in large samples, a very small correlation may have a low p-value without a large bias in the unweighted regression.
References
Pfeffermann, D., & Sverchkov, M. (1999). Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhya: The Indian Journal of Statistics, 61. 166-186.
See Also
Other survey tools:
svycor(),
svysd(),
weights_tests(),
wgttest()
Examples
# Note: This is a contrived example to show how the function works,
# not a case with actual sammpling weights from a survey vendor
if (requireNamespace("boot")) {
states <- as.data.frame(state.x77)
set.seed(100)
states$wts <- runif(50, 0, 3)
fit <- lm(Murder ~ Illiteracy + Frost, data = states)
pf_sv_test(model = fit, data = states, weights = wts, sims = 100)
}
Plot Regression Summaries
Description
plot_summs and plot_coefs create regression coefficient
plots with ggplot2.
Usage
plot_summs(
...,
ci_level = 0.95,
model.names = NULL,
coefs = NULL,
omit.coefs = "(Intercept)",
inner_ci_level = NULL,
colors = "CUD Bright",
plot.distributions = FALSE,
rescale.distributions = FALSE,
exp = FALSE,
point.shape = TRUE,
point.size = 5,
line.size = c(0.8, 2),
legend.title = "Model",
groups = NULL,
facet.rows = NULL,
facet.cols = NULL,
facet.label.pos = "top",
color.class = colors,
resp = NULL,
dpar = NULL,
coefs.match = c("exact", "regex")
)
plot_coefs(
...,
ci_level = 0.95,
inner_ci_level = NULL,
model.names = NULL,
coefs = NULL,
omit.coefs = c("(Intercept)", "Intercept"),
colors = "CUD Bright",
plot.distributions = FALSE,
rescale.distributions = FALSE,
exp = FALSE,
point.shape = TRUE,
point.size = 5,
line.size = c(0.8, 2),
legend.title = "Model",
groups = NULL,
facet.rows = NULL,
facet.cols = NULL,
facet.label.pos = "top",
color.class = colors,
resp = NULL,
dpar = NULL,
coefs.match = c("exact", "regex")
)
Arguments
...
regression model(s). You may also include arguments to be passed
to tidy().
ci_level
The desired width of confidence intervals for the coefficients. Default: 0.95
model.names
If plotting multiple models simultaneously, you can provide a vector of names here. If NULL, they will be named sequentially as "Model 1", "Model 2", and so on. Default: NULL
coefs
If you'd like to include only certain coefficients, provide them as a vector. If it is a named vector, then the names will be used in place of the variable names. See details for examples. Default: NULL
omit.coefs
If you'd like to specify some coefficients to not include
in the plot, provide them as a vector. This argument is overridden by
coefs if both are provided. By default, the intercept term is omitted.
To include the intercept term, just set omit.coefs to NULL.
inner_ci_level
Plot a thicker line representing some narrower span
than ci_level. Default is NULL, but good options are .9, .8, or .5.
colors
See jtools_colors for more on your color options. Default: 'CUD Bright'
plot.distributions
Instead of just plotting the ranges, you may plot normal distributions representing the width of each estimate. Note that these are completely theoretical and not based on a bootstrapping or MCMC procedure, even if the source model was fit that way. Default is FALSE.
rescale.distributions
If plot.distributions is TRUE, the default
behavior is to plot each normal density curve on the same scale. If some
of the uncertainty intervals are much wider/narrower than others, that
means the wide ones will have such a low height that you won't be able
to see the curve. If you set this parameter to TRUE, each curve will
have the same maximum height regardless of their width.
exp
If TRUE, all coefficients are exponentiated (e.g., transforms logit coefficents from log odds scale to odds). The reference line is also moved to 1 instead of 0.
point.shape
When using multiple models, should each model's point estimates use a different point shape to visually differentiate each model from the others? Default is TRUE. You may also pass a vector of shapes to specify shapes yourself.
point.size
Change the size of the points. Default is 3.
line.size
Change the thickness of the error bar lines.
Default is c(0.8, 2). The first number is the size for the full
width of the interval, the second number is used for the thicker
inner interval when inner.ci is TRUE.
legend.title
What should the title for the legend be? Default is
"Model", but you can specify it here since it is rather difficult to
change later via ggplot2's typical methods.
groups
If you would like to have facets (i.e., separate panes) for different groups of coefficients, you can specify those groups with a list here. See details for more on how to do this.
facet.rows
The number of rows in the facet grid (the nrow argument
to ggplot2::facet_wrap() ).
facet.cols
The number of columns in the facet grid (the nrow
argument to ggplot2::facet_wrap() ).
facet.label.pos
Where to put the facet labels. One of "top" (the default), "bottom", "left", or "right".
color.class
Deprecated. Now known as colors.
resp
For any models that are brmsfit and have multiple response
variables, specify them with a vector here. If the model list includes
other types of models, you do not need to enter resp for those models.
For instance, if I want to plot a lm object and two brmsfit objects,
you only need to provide a vector of length 2 for resp.
dpar
For any models that are brmsfit and have a distributional
dependent variable, that can be specified here. If NULL, it is assumed you
want coefficients for the location/mean parameter, not the distributional
parameter(s).
coefs.match
This modifies the way the coefs and omit.coefs
arguments are interpreted. The default "exact" which represents the
legacy behavior, will include/exclude coefficients that match exactly
with your inputs to those functions. If "regex", coefs and
omit.coefs are used as the pattern argument for grepl() matching
the coefficient names. Note that using "regex" means you will be unable
to override the default coefficient names via a named vector.
Details
A note on the distinction between plot_summs and plot_coefs:
plot_summs only accepts models supported by summ() and allows users
to take advantage of the standardization and robust standard error features
(among others as may be relevant). plot_coefs supports any models that
have a broom::tidy() method defined in the broom package, but of course
lacks any additional features like robust standard errors. To get a mix
of the two, you can pass summ objects to plot_coefs too.
For coefs, if you provide a named vector of coefficients, then
the plot will refer to the selected coefficients by the names of the
vector rather than the coefficient names. For instance, if I want to
include only the coefficients for the hp and mpg but have
the plot refer to them as "Horsepower" and "Miles/gallon", I'd provide
the argument like this:
c("Horsepower" = "hp", "Miles/gallon" = "mpg")
To use the groups argument, provide a (preferably named) list of
character vectors. If I want separate panes with "Frost" and "Illiteracy"
in one and "Population" and "Area" in the other, I'd make a list like
this:
list(pane_1 = c("Frost", "Illiteracy"), pane_2 = c("Population", "Area"))
Value
A ggplot object.
Examples
states <- as.data.frame(state.x77)
fit1 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + `Life Exp` + `HS Grad`,
data = states, weights = runif(50, 0.1, 3))
fit2 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + `Life Exp` + `HS Grad`,
data = states, weights = runif(50, 0.1, 3))
fit3 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + `Life Exp` + `HS Grad`,
data = states, weights = runif(50, 0.1, 3))
# Plot all 3 regressions with custom predictor labels,
# standardized coefficients, and robust standard errors
plot_summs(fit1, fit2, fit3,
coefs = c("Frost Days" = "Frost", "% Illiterate" = "Illiteracy",
"Murder Rate" = "Murder"),
scale = TRUE, robust = TRUE)
Alternative interface for merMod predictions
Description
This function generates predictions for merMod models, but
with the ability to get standard errors as well.
Usage
predict_merMod(
object,
newdata = NULL,
se.fit = FALSE,
use.re.var = FALSE,
allow.new.levels = FALSE,
type = c("link", "response", "terms"),
na.action = na.pass,
re.form = NULL,
boot = FALSE,
sims = 100,
prog.arg = "none",
...
)
Arguments
object
a fitted model object
newdata
data frame for which to evaluate predictions.
se.fit
Include standard errors with the predictions? Note that these standard errors by default include only fixed effects variance. See details for more info. Default is FALSE.
use.re.var
If se.fit is TRUE, include random effects variance in
standard errors? Default is FALSE.
allow.new.levels
logical if new levels (or NA values) in
newdata are allowed. If FALSE (default), such new values in
newdata will trigger an error; if TRUE, then the prediction
will use the unconditional (population-level) values for data with
previously unobserved levels (or NAs).
type
character string - either "link", the default, or
"response" indicating the type of prediction object returned.
na.action
function determining what should be done
with missing values for fixed effects in newdata.
The default is to predict NA: see na.pass .
re.form
(formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL,
include all random effects; if NA or ~0,
include no random effects.
boot
Use bootstrapping (via lme4::bootMer() ) to estimate
variance for se.fit? Default is FALSE
sims
If boot is TRUE, how many simulations should be run? Default
is 100.
prog.arg
If boot and se.fit are TRUE, a character string -
type of progress bar to display. Default is "none"; the function will look
for a relevant *ProgressBar function, so "txt" will work in general;
"tk" is available if the tcltk package is loaded; or "win" on Windows
systems. Progress bars are disabled (with a message) for parallel operation.
...
When boot and se.fit are TRUE, any additional arguments are
passed to lme4::bootMer().
Details
The developers of lme4 omit an se.fit argument for a
reason, which is that it's not perfectly clear how best to estimate
the variance for these models. This solution is a logical one, but perhaps
not perfect. Bayesian models are one way to do better.
The method used here is based on the one described here: http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#predictions-andor-confidence-or-prediction-intervals-on-predictions
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Scale variables in fitted regression models
Description
scale_mod (previously known as scale_lm) takes fitted regression models
and scales all
predictors by dividing each by 1 or 2 standard deviations (as chosen by the
user).
Usage
scale_mod(model, ...)
## Default S3 method:
scale_mod(
model,
binary.inputs = "0/1",
n.sd = 1,
center = TRUE,
scale.response = FALSE,
center.only = FALSE,
scale.only = FALSE,
data = NULL,
vars = NULL,
apply.weighted.contrasts = getOption("jtools-weighted.contrasts", FALSE),
...
)
Arguments
model
A regression model of type lm, glm,
svyglm , or lme4::merMod. Other model types
may work as well but are not tested.
...
Arguments passed on to gscale() .
binary.inputs
Options for binary variables. Default is "0/1";
"0/1" keeps original scale; "-0.5,0.5" rescales 0 as -0.5
and
1 as 0.5; center subtracts the mean; and full treats them
like other continuous variables.
n.sd
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2.
center
Default is TRUE. If TRUE, the predictors are
also
mean-centered. For binary predictors, the binary.inputs argument
supersedes this one.
scale.response
Should the response variable also be rescaled? Default
is FALSE.
center.only
Rather than actually scale predictors, just mean-center them.
scale.only
A logical value indicating whether you would like to scale the values, but not mean-center them.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
vars
A character vector of variable names that you want to be scaled. If NULL, the default, it is all predictors.
apply.weighted.contrasts
Factor variables cannot be scaled, but you
can set the contrasts such that the intercept in a regression model will
reflect the true mean (assuming all other variables are centered). If set
to TRUE, the argument will apply weighted effects coding to all factors.
This is similar to the R default effects coding, but weights according to
how many observations are at each level. An adapted version of
contr.wec() from the wec package is used to do this. See
that package's documentation and/or Grotenhuis et al. (2016) for more
info.
Details
This function will scale all continuous variables in a regression model for ease of interpretation, especially for those models that have interaction terms. It can also mean-center all of them as well, if requested.
The scaling happens on the input data, not the terms themselves. That means interaction terms are still properly calculated because they are the product of standardized predictors, not a standardized product of predictors.
This function re-estimates the model, so for large models one should expect a runtime equal to the first run.
Value
The functions returns a re-fitted model object, inheriting from whichever class was supplied.
Author(s)
Jacob Long jacob.long@sc.edu
References
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40(3), 373-400.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analyses for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
See Also
sim_slopes performs a simple slopes analysis.
interact_plot creates attractive, user-configurable plots of
interaction models.
standardization, scaling, and centering tools
center(),
center_mod(),
gscale(),
standardize()
Examples
fit <- lm(formula = Murder ~ Income * Illiteracy,
data = as.data.frame(state.x77))
fit_scale <- scale_mod(fit)
fit_scale <- scale_mod(fit, center = TRUE)
# With weights
fitw <- lm(formula = Murder ~ Income * Illiteracy,
data = as.data.frame(state.x77),
weights = Population)
fitw_scale <- scale_mod(fitw)
fitw_scale <- scale_mod(fitw, center = TRUE, binary.input = "0/1")
# With svyglm
if (requireNamespace("survey")) {
library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
regmodel <- svyglm(api00~ell*meals,design=dstrat)
regmodel_scale <- scale_mod(regmodel)
regmodel_scale <- scale_mod(regmodel, binary.input = "0/1")
}
Set defaults for summ() functions
Description
This function is convenience wrapper for manually setting
options using options() . This gives a handy way to, for instance,
set the arguments to be used in every call to summ() in your
script/session.
To make the settings persist across sessions, you can run this in your
.Rprofile file.
Note that arguments that do not apply (e.g., robust for merMod models)
are silently ignored when those types of models are used.
Usage
set_summ_defaults(
digits = NULL,
model.info = NULL,
model.fit = NULL,
model.coefs = NULL,
pvals = NULL,
robust = NULL,
confint = NULL,
ci.width = NULL,
vifs = NULL,
conf.method = NULL,
table.format = NULL
)
Arguments
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE.
robust
If not FALSE, reports heteroskedasticity-robust standard
errors instead of conventional SEs. These are also known as Huber-White
standard errors. There are several options provided by
sandwich::vcovHC() : "HC0", "HC1", "HC2", "HC3", "HC4",
"HC4m", "HC5".
Default is FALSE.
This requires the sandwich package to compute the
standard errors.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
vifs
If TRUE, adds a column to output with variance inflation
factors (VIF). Default is FALSE.
conf.method
Argument passed to lme4::confint.merMod() . Default
is "Wald", but "profile" or "boot" are better when accuracy is a
priority. Be aware that both of the alternate methods are sometimes very
time-consuming.
table.format
A format understood by md_table()
Standardize vectors, data frames, and survey designs
Description
This function is a wrapper around gscale() that is configured
to do a conventional standardization of continuous variables,
mean-centering and dividing by one standard deviation.
Usage
standardize(
data = NULL,
vars = NULL,
binary.inputs = "center",
binary.factors = FALSE,
weights = NULL
)
Arguments
data
A data frame or survey design. Only needed if you would like to
rescale multiple variables at once. If x = NULL, all columns will
be rescaled. Otherwise, x should be a vector of variable names. If
x is a numeric vector, this argument is ignored.
vars
If data is a data.frame or similar, you can scale only
select columns by providing a vector column names to this argument.
binary.inputs
Options for binary variables. Default is center;
0/1 keeps original scale; -0.5/0.5 rescales 0 as -0.5 and 1
as 0.5; center subtracts the mean; and full subtracts the
mean and divides by 2 sd.
binary.factors
Coerce two-level factors to numeric and apply scaling functions to them? Default is FALSE.
weights
A vector of weights equal in length to x. If iterating
over a data frame, the weights will need to be equal in length to all the
columns to avoid errors. You may need to remove missing values before using
the weights.
Details
Some more information can be found in the documentation for
gscale()
Value
A transformed version of the data argument.
See Also
standardization, scaling, and centering tools
center(),
center_mod(),
gscale(),
scale_mod()
Examples
# Standardize just the "qsec" variable in mtcars
standardize(mtcars, vars = "qsec")
Regression summaries with options
Description
To get specific documentation, choose the appropriate link to the type of model that you want to summarize from the details section.
Usage
summ(model, ...)
j_summ(model, ...)
Arguments
...
Other arguments to be passed to the model-specific function.
Details
Generalized linear regression summaries with options
Description
summ() prints output for a regression model in a fashion similar to
summary(), but formatted differently with more options.
Usage
## S3 method for class 'glm'
summ(
model,
scale = FALSE,
confint = getOption("summ-confint", FALSE),
ci.width = getOption("summ-ci.width", 0.95),
robust = getOption("summ-robust", FALSE),
cluster = NULL,
vifs = getOption("summ-vifs", FALSE),
digits = getOption("jtools-digits", default = 2),
exp = FALSE,
pvals = getOption("summ-pvals", TRUE),
n.sd = 1,
center = FALSE,
transform.response = FALSE,
scale.only = FALSE,
data = NULL,
model.info = getOption("summ-model.info", TRUE),
model.fit = getOption("summ-model.fit", TRUE),
model.coefs = getOption("summ-model.coefs", TRUE),
which.cols = NULL,
vcov = NULL,
...
)
Arguments
model
A glm object.
scale
If TRUE, reports standardized regression
coefficients by scaling and mean-centering input data (the latter can be
changed via the scale.only argument). Default is FALSE.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
robust
If not FALSE, reports heteroskedasticity-robust standard
errors instead of conventional SEs. These are also known as Huber-White
standard errors. There are several options provided by
sandwich::vcovHC() : "HC0", "HC1", "HC2", "HC3", "HC4",
"HC4m", "HC5".
Default is FALSE.
This requires the sandwich package to compute the
standard errors.
cluster
For clustered standard errors, provide the column name of
the cluster variable in the input data frame (as a string). Alternately,
provide a vector of clusters. Note that you must set robust to either
"HC1", "HC2", or "HC3" in order to have clustered standard errors ("HC4"
and "HC5" are not supported.
vifs
If TRUE, adds a column to output with variance inflation
factors (VIF). Default is FALSE.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
exp
If TRUE, reports exponentiated coefficients with
confidence intervals for exponential models like logit and Poisson models.
This quantity is known as an odds ratio for binary outcomes and incidence
rate ratio for count models.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE.
n.sd
If scale = TRUE, how many standard deviations should
predictors be divided by? Default is 1, though some suggest 2.
center
If you want coefficients for mean-centered variables but don't
want to standardize, set this to TRUE. Note that setting this to
false does not affect whether scale mean-centers variables. Use
scale.only for that.
transform.response
Should scaling/centering apply to response
variable? Default is FALSE.
scale.only
If you want to scale but not center, set this to TRUE.
Note that for legacy reasons, setting scale = TRUE and center = FALSE
will not achieve the same effect. Default is FALSE.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model (if scale is TRUE)
instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
which.cols
Developmental feature. By providing columns by name, you can add/remove/reorder requested columns in the output. Not fully supported, for now.
vcov
You may provide your own variance-covariance matrix for the
regression coefficients if you want to calculate standard errors in
some way not accommodated by the robust and cluster options.
...
Among other things, arguments are passed to scale_mod() or
center_mod() when center or scale is TRUE.
Details
By default, this function will print the following items to the console:
The sample size
The name of the outcome variable
The chi-squared test, (Pseudo-)R-squared value and AIC/BIC.
A table with regression coefficients, standard errors, z values, and p values.
There are several options available for robust. The heavy
lifting is done by sandwich::vcovHC() , where those are better
described.
Put simply, you may choose from "HC0" to "HC5". Based on the
recommendation of the developers of sandwich, the default is set to
"HC3". Stata's default is "HC1", so that choice may be better
if the goal is to replicate Stata's output. Any option that is understood by
vcovHC() will be accepted. Cluster-robust standard errors are
computed
if cluster is set to the name of the input data's cluster variable
or is a vector of clusters.
The scale and center options are performed via
refitting
the model with scale_mod() and center_mod() ,
respectively. Each of those in turn uses gscale() for the
mean-centering and scaling.
Value
If saved, users can access most of the items that are returned in the output (and without rounding).
coeftable
The outputted table of variables and coefficients
model
The model for which statistics are displayed. This would be
most useful in cases in which scale = TRUE.
Much other information can be accessed as attributes.
Author(s)
Jacob Long jacob.long@sc.edu
References
King, G., & Roberts, M. E. (2015). How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis, 23(2), 159–179. doi:10.1093/pan/mpu015
Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The Importance of the Normality Assumption in Large Public Health Data Sets. Annual Review of Public Health, 23, 151–169. doi:10.1146/annurev.publhealth.23.100901.140546
See Also
scale_mod() can simply perform the standardization if
preferred.
gscale() does the heavy lifting for mean-centering and scaling
behind the scenes.
Other summ:
summ.lm(),
summ.merMod(),
summ.rq(),
summ.svyglm()
Examples
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson)
# Summarize with standardized coefficients
summ(glm.D93, scale = TRUE)
Linear regression summaries with options
Description
summ() prints output for a regression model in a fashion similar to
summary(), but formatted differently with more options.
Usage
## S3 method for class 'lm'
summ(
model,
scale = FALSE,
confint = getOption("summ-confint", FALSE),
ci.width = getOption("summ-ci.width", 0.95),
robust = getOption("summ-robust", FALSE),
cluster = NULL,
vifs = getOption("summ-vifs", FALSE),
digits = getOption("jtools-digits", 2),
pvals = getOption("summ-pvals", TRUE),
n.sd = 1,
center = FALSE,
transform.response = FALSE,
scale.only = FALSE,
data = NULL,
part.corr = FALSE,
model.info = getOption("summ-model.info", TRUE),
model.fit = getOption("summ-model.fit", TRUE),
model.coefs = getOption("summ-model.coefs", TRUE),
which.cols = NULL,
vcov = NULL,
...
)
Arguments
model
A lm object.
scale
If TRUE, reports standardized regression
coefficients by scaling and mean-centering input data (the latter can be
changed via the scale.only argument). Default is FALSE.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
robust
If not FALSE, reports heteroskedasticity-robust standard
errors instead of conventional SEs. These are also known as Huber-White
standard errors. There are several options provided by
sandwich::vcovHC() : "HC0", "HC1", "HC2", "HC3", "HC4",
"HC4m", "HC5".
Default is FALSE.
This requires the sandwich package to compute the
standard errors.
cluster
For clustered standard errors, provide the column name of
the cluster variable in the input data frame (as a string). Alternately,
provide a vector of clusters. Note that you must set robust to either
"HC1", "HC2", or "HC3" in order to have clustered standard errors ("HC4"
and "HC5" are not supported.
vifs
If TRUE, adds a column to output with variance inflation
factors (VIF). Default is FALSE.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE.
n.sd
If scale = TRUE, how many standard deviations should
predictors be divided by? Default is 1, though some suggest 2.
center
If you want coefficients for mean-centered variables but don't
want to standardize, set this to TRUE. Note that setting this to
false does not affect whether scale mean-centers variables. Use
scale.only for that.
transform.response
Should scaling/centering apply to response
variable? Default is FALSE.
scale.only
If you want to scale but not center, set this to TRUE.
Note that for legacy reasons, setting scale = TRUE and center = FALSE
will not achieve the same effect. Default is FALSE.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model (if scale is TRUE)
instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
part.corr
Print partial (labeled "partial.r") and
semipartial (labeled "part.r") correlations with the table?
Default is FALSE. See details about these quantities when robust
standard errors are used.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
which.cols
Developmental feature. By providing columns by name, you can add/remove/reorder requested columns in the output. Not fully supported, for now.
vcov
You may provide your own variance-covariance matrix for the
regression coefficients if you want to calculate standard errors in
some way not accommodated by the robust and cluster options.
...
Among other things, arguments are passed to scale_mod() or
center_mod() when center or scale is TRUE.
Details
By default, this function will print the following items to the console:
The sample size
The name of the outcome variable
The R-squared value plus adjusted R-squared
A table with regression coefficients, standard errors, t-values, and p values.
There are several options available for robust. The heavy
lifting is done by sandwich::vcovHC() , where those are better
described.
Put simply, you may choose from "HC0" to "HC5". Based on the
recommendation of the developers of sandwich, the default is set to
"HC3". Stata's default is "HC1", so that choice may be better
if the goal is to replicate Stata's output. Any option that is understood
by vcovHC() will be accepted. Cluster-robust standard errors are
computed if cluster is set to the name of the input data's cluster
variable or is a vector of clusters.
The scale and center options are performed via
refitting
the model with scale_mod() and center_mod() ,
respectively. Each of those in turn uses gscale() for the
mean-centering and scaling.
If using part.corr = TRUE, then you will get these two common
effect size metrics on the far right two columns of the output table.
However, it should be noted that these do not go hand in hand with
robust standard error estimators. The standard error of the coefficient
doesn't change the point estimate, just the uncertainty. However,
this function uses t-statistics in its calculation of the
partial and semipartial correlation. This provides what amounts to a
heteroskedasticity-adjusted set of estimates, but I am unaware of any
statistical publication that validates this type of use. Please
use these as a heuristic when used alongside robust standard errors; do
not report the "robust" partial and semipartial correlations in
publications.
Value
If saved, users can access most of the items that are returned in the output (and without rounding).
coeftable
The outputted table of variables and coefficients
model
The model for which statistics are displayed. This would be
most useful in cases in which scale = TRUE.
Much other information can be accessed as attributes.
Author(s)
Jacob Long jacob.long@sc.edu
References
King, G., & Roberts, M. E. (2015). How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis, 23(2), 159–179. doi:10.1093/pan/mpu015
Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The Importance of the Normality Assumption in Large Public Health Data Sets. Annual Review of Public Health, 23, 151–169. doi:10.1146/annurev.publhealth.23.100901.140546
See Also
scale_mod() can simply perform the standardization if
preferred.
gscale() does the heavy lifting for mean-centering and scaling
behind the scenes.
Other summ:
summ.glm(),
summ.merMod(),
summ.rq(),
summ.svyglm()
Examples
# Create lm object
fit <- lm(Income ~ Frost + Illiteracy + Murder,
data = as.data.frame(state.x77))
# Print the output with standardized coefficients and 3 digits
summ(fit, scale = TRUE, digits = 3)
Mixed effects regression summaries with options
Description
summ() prints output for a regression model in a fashion similar to
summary(), but formatted differently with more options.
Usage
## S3 method for class 'merMod'
summ(
model,
scale = FALSE,
confint = getOption("summ-confint", FALSE),
ci.width = getOption("summ-ci.width", 0.95),
conf.method = getOption("summ-conf.method", c("Wald", "profile", "boot")),
digits = getOption("jtools-digits", default = 2),
r.squared = TRUE,
pvals = getOption("summ-pvals", NULL),
n.sd = 1,
center = FALSE,
transform.response = FALSE,
scale.only = FALSE,
data = NULL,
exp = FALSE,
t.df = NULL,
model.info = getOption("summ-model.info", TRUE),
model.fit = getOption("summ-model.fit", TRUE),
model.coefs = getOption("summ-model.coefs", TRUE),
re.variance = getOption("summ-re.variance", c("sd", "var")),
which.cols = NULL,
re.table = getOption("summ-re.table", TRUE),
groups.table = getOption("summ-groups.table", TRUE),
...
)
Arguments
model
A merMod object.
scale
If TRUE, reports standardized regression
coefficients by scaling and mean-centering input data (the latter can be
changed via the scale.only argument). Default is FALSE.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
conf.method
Argument passed to lme4::confint.merMod() . Default
is "Wald", but "profile" or "boot" are better when accuracy is a
priority. Be aware that both of the alternate methods are sometimes very
time-consuming.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
r.squared
Calculate an r-squared model fit statistic? Default is
TRUE, but if it has errors or takes a long time to calculate you
may want to consider setting to FALSE.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE, except for merMod objects (see
details).
n.sd
If scale = TRUE, how many standard deviations should
predictors be divided by? Default is 1, though some suggest 2.
center
If you want coefficients for mean-centered variables but don't
want to standardize, set this to TRUE. Note that setting this to
false does not affect whether scale mean-centers variables. Use
scale.only for that.
transform.response
Should scaling/centering apply to response
variable? Default is FALSE.
scale.only
If you want to scale but not center, set this to TRUE.
Note that for legacy reasons, setting scale = TRUE and center = FALSE
will not achieve the same effect. Default is FALSE.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model (if scale is TRUE)
instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
exp
If TRUE, reports exponentiated coefficients with
confidence intervals for exponential models like logit and Poisson models.
This quantity is known as an odds ratio for binary outcomes and incidence
rate ratio for count models.
t.df
For lmerMod models only. User may set the degrees of
freedom used in conducting t-tests. See details for options.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
re.variance
Should random effects variances be expressed in
standard deviations or variances? Default, to be consistent with previous
versions of jtools, is "sd". Use "var" to get the variance instead.
which.cols
Developmental feature. By providing columns by name, you can add/remove/reorder requested columns in the output. Not fully supported, for now.
re.table
Show table summarizing variance of random effects? Default is TRUE.
groups.table
Show table summarizing the grouping variables? Default is TRUE.
...
Among other things, arguments are passed to scale_mod() or
center_mod() when center or scale is TRUE.
Details
By default, this function will print the following items to the console:
The sample size
The name of the outcome variable
The (Pseudo-)R-squared value and AIC/BIC.
A table with regression coefficients, standard errors, and t-values.
The scale and center options are performed via refitting
the model with scale_mod() and center_mod() ,
respectively. Each of those in turn uses gscale() for the
mean-centering and scaling.
merMod models are a bit different than the others. The lme4
package developers have, for instance, made a decision not to report or
compute p values for lmer() models. There are good reasons for this,
most notably that the t-values produced are not "accurate" in the sense of
the Type I error rate. For certain large, balanced samples with many
groups, this is no big deal. What's
a "big" or "small" sample? How much balance is necessary? What type of
random effects structure is okay? Good luck getting a statistician to
give you any clear guidelines on this.
Some simulation studies have been done on fewer than 100 observations, so
for sure if your sample is around 100 or fewer you should not interpret
the t-values. A large number of groups is also crucial for avoiding bias
using t-values. If groups are nested or crossed in a linear model,
it is best to just get the pbkrtest package.
By default, this function follows lme4's lead and does not report
the p values for lmer() models. If the user has pbkrtest
installed, however, p values are reported using the Kenward-Roger
d.f. approximation unless pvals = FALSE or t.df is
set to something other than NULL. In publications,
you should cite the
Kenward & Roger (1997) piece as well as either this package or
pbkrtest package to explain how the p values were calculated.
See pvalues from the lme4 for more details.
If you're looking for a simple test with no extra packages installed,
it is better to use the confidence
intervals and check to see if they exclude zero than use the t-test.
For users of glmer(), see some of the advice there as well. While
lme4 and by association summ() does as well, they are
still imperfect.
You have some options to customize the output in this regard with the
t.df argument. If NULL, the default, the
degrees of freedom used depends on whether the user has
lmerTest or pbkrtest installed. If lmerTest is installed,
the degrees of freedom for each coefficient are calculated using the
Satterthwaite method and the p values calculated accordingly.
If only pbkrtest is installed or t.df is "k-r", the Kenward-Roger
approximation of the standard errors and degrees of freedom for each
coefficient is used. Note that Kenward-Roger standard errors can take
longer to calculate and may cause R to crash with models fit to large
(roughly greater than 5000 rows) datasets.
If neither is installed and the user sets
pvals = TRUE, then the residual degrees of freedom
is used. If t.df = "residual", then the residual d.f. is used
without a message. If the user prefers to use some other method to
determine the d.f., then any number provided as the argument will be
used.
About pseudo-R^2
There is no one way to calculate R^2 for mixed models or nonlinear models. Many caution against interpreting or even using such approximations outside of OLS regression. With that said, this package reports one version for your benefit, though you should of course understand that it is not an unambiguous measure of model fit.
This package calculates R^2 for mixed models using an adapted version
of rsquared() from the piecewiseSEM
package. This is an implementation of the Nakagawa & Schielzeth (2013)
procedure with refinements by Johnson (2014). If you choose to report
the pseudo-R^2 in a publication, you should cite Nakagawa & Schielzeth
to explain how the calculation was done.
Value
If saved, users can access most of the items that are returned in the output (and without rounding).
coeftable
The outputted table of variables and coefficients
rcoeftable
The secondary table with the grouping variables and random coefficients.
gvars
The tertiary table with the grouping variables, numbers of groups, and ICCs.
model
The model for which statistics are displayed. This would be
most useful in cases in which scale = TRUE.
Much other information can be accessed as attributes.
Author(s)
Jacob Long jacob.long@sc.edu
References
Johnson, P. C. D. (2014). Extension of Nakagawa & Schielzeth's $R^2_GLMM$ to random slopes models. Methods in Ecology and Evolution, 5, 944–946. doi:10.1111/2041-210X.12225
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983. doi:10.2307/2533558
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82. doi:10.18637/jss.v082.i13
Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in R. Behavior Research Methods, 49, 1494–1502. doi:10.3758/s13428-016-0809-y
Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining $R^2$ from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4, 133–142. doi:10.1111/j.2041-210x.2012.00261.x
See Also
scale_mod() can simply perform the standardization if
preferred.
gscale() does the heavy lifting for mean-centering and scaling
behind the scenes.
pbkrtest::get_ddf_Lb() gets the Kenward-Roger degrees of
freedom if you have pbkrtest installed.
A tweaked version of piecewiseSEM::rsquared() is used to
generate the pseudo-R-squared estimates for linear models.
Other summ:
summ.glm(),
summ.lm(),
summ.rq(),
summ.svyglm()
Examples
if (requireNamespace("lme4")) {
library(lme4, quietly = TRUE)
data(sleepstudy)
mv <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
summ(mv) # Note lack of p values if you don't have lmerTest/pbkrtest
# Without lmerTest/pbkrtest, you'll get message about Type 1 errors
summ(mv, pvals = TRUE)
# To suppress message, manually specify t.df argument
summ(mv, t.df = "residual")
# Confidence intervals may be better alternative to p values
summ(mv, confint = TRUE)
# Use conf.method to get profile intervals (may be slow to run)
# summ(mv, confint = TRUE, conf.method = "profile")
}
Quantile regression summaries with options
Description
summ() prints output for a regression model in a fashion
similar to summary(), but formatted differently with more options.
Usage
## S3 method for class 'rq'
summ(
model,
scale = FALSE,
confint = getOption("summ-confint", FALSE),
ci.width = getOption("summ-ci.width", 0.95),
se = c("nid", "rank", "iid", "ker", "boot"),
boot.sims = 1000,
boot.method = "xy",
vifs = getOption("summ-vifs", FALSE),
digits = getOption("jtools-digits", 2),
pvals = getOption("summ-pvals", TRUE),
n.sd = 1,
center = FALSE,
transform.response = FALSE,
data = NULL,
model.info = getOption("summ-model.info", TRUE),
model.fit = getOption("summ-model.fit", TRUE),
model.coefs = getOption("summ-model.coefs", TRUE),
which.cols = NULL,
...
)
Arguments
model
A rq model. At this time, rqs models (multiple tau
parameters) are not supported.
scale
If TRUE, reports standardized regression
coefficients by scaling and mean-centering input data (the latter can be
changed via the scale.only argument). Default is FALSE.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
se
One of "nid", "rank", "iid", "ker", or "boot". "nid" is default.
See quantreg::summary.rq() documentation for more about these options.
boot.sims
If se = "boot", the number of bootstrap replications to
perform. This is passed as the R argument to boot.rq
boot.method
If se = "boot", the type of bootstrapping method to
use. Default is "xy", but see quantreg::boot.rq() for more options.
vifs
If TRUE, adds a column to output with variance inflation
factors (VIF). Default is FALSE.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE.
n.sd
If scale = TRUE, how many standard deviations should
predictors be divided by? Default is 1, though some suggest 2.
center
If you want coefficients for mean-centered variables but don't
want to standardize, set this to TRUE. Note that setting this to
false does not affect whether scale mean-centers variables. Use
scale.only for that.
transform.response
Should scaling/centering apply to response
variable? Default is FALSE.
data
If you provide the data used to fit the model here, that data
frame is used to re-fit the model (if scale is TRUE)
instead of the stats::model.frame()
of the model. This is particularly useful if you have variable
transformations or polynomial terms specified in the formula.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
which.cols
Developmental feature. By providing columns by name, you can add/remove/reorder requested columns in the output. Not fully supported, for now.
...
Among other things, arguments are passed to scale_mod() or
center_mod() when center or scale is TRUE.
Details
This method implements most of the things I think most users would
asking summary.rq for. hs, U, and gamma are ignored.
Note that when using se = "rank", there are no standard errors,
test statistics, or p values calculated.
About the R1 fit statistic: Described in Koenker & Machado (1999), this offers an interpretation similar to R-squared in OLS regression. While you could calculate R-squared for these models, it goes against the underlying theoretical rationale for them. Koenker himself is not a big fan of R1 either, but it's something. See Koenker & Machado (1999) for more info.
References
Koenker, R., & Machado, J. A. F. (1999). Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association, 94, 1296–1310. https://doi.org/10.1080/01621459.1999.10473882
See Also
Other summ:
summ.glm(),
summ.lm(),
summ.merMod(),
summ.svyglm()
Examples
if (requireNamespace("quantreg")) {
library(quantreg)
data(engel)
fitrq <- rq(income ~ foodexp, data = engel, tau = 0.5)
summ(fitrq)
}
Complex survey regression summaries with options
Description
summ() prints output for a regression model in a fashion similar to
summary(), but formatted differently with more options.
Usage
## S3 method for class 'svyglm'
summ(
model,
scale = FALSE,
confint = getOption("summ-confint", FALSE),
ci.width = getOption("summ-ci.width", 0.95),
digits = getOption("jtools-digits", default = 2),
pvals = getOption("summ-pvals", TRUE),
n.sd = 1,
center = FALSE,
transform.response = FALSE,
scale.only = FALSE,
exp = FALSE,
vifs = getOption("summ-vifs", FALSE),
model.info = getOption("summ-model.info", TRUE),
model.fit = getOption("summ-model.fit", TRUE),
model.coefs = getOption("summ-model.coefs", TRUE),
which.cols = NULL,
...
)
Arguments
model
A svyglm object.
scale
If TRUE, reports standardized regression
coefficients by scaling and mean-centering input data (the latter can be
changed via the scale.only argument). Default is FALSE.
confint
Show confidence intervals instead of standard errors? Default
is FALSE.
ci.width
A number between 0 and 1 that signifies the width of the
desired confidence interval. Default is .95, which corresponds
to a 95% confidence interval. Ignored if confint = FALSE.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired
number.
pvals
Show p values? If FALSE, these
are not printed. Default is TRUE.
n.sd
If scale = TRUE, how many standard deviations should
predictors be divided by? Default is 1, though some suggest 2.
center
If you want coefficients for mean-centered variables but don't
want to standardize, set this to TRUE. Note that setting this to
false does not affect whether scale mean-centers variables. Use
scale.only for that.
transform.response
Should scaling/centering apply to response
variable? Default is FALSE.
scale.only
If you want to scale but not center, set this to TRUE.
Note that for legacy reasons, setting scale = TRUE and center = FALSE
will not achieve the same effect. Default is FALSE.
exp
If TRUE, reports exponentiated coefficients with
confidence intervals for exponential models like logit and Poisson models.
This quantity is known as an odds ratio for binary outcomes and incidence
rate ratio for count models.
vifs
If TRUE, adds a column to output with variance inflation
factors (VIF). Default is FALSE.
model.info
Toggles printing of basic information on sample size, name of DV, and number of predictors.
model.fit
Toggles printing of model fit statistics.
model.coefs
Toggles printing of model coefficents.
which.cols
Developmental feature. By providing columns by name, you can add/remove/reorder requested columns in the output. Not fully supported, for now.
...
Among other things, arguments are passed to scale_mod() or
center_mod() when center or scale is TRUE.
Details
By default, this function will print the following items to the console:
The sample size
The name of the outcome variable
The (Pseudo-)R-squared value and AIC.
A table with regression coefficients, standard errors, t values, and p values.
The scale and center options are performed via refitting
the model with scale_mod() and center_mod() ,
respectively. Each of those in turn uses gscale() for the
mean-centering and scaling. These functions can handle svyglm objects
correctly by calling svymean() and svyvar() to compute means
and
standard deviations. Weights are not altered. The fact that the model is
refit means the runtime will be similar to the original time it took to fit
the model.
Value
If saved, users can access most of the items that are returned in the output (and without rounding).
coeftable
The outputted table of variables and coefficients
model
The model for which statistics are displayed. This would be
most useful in cases in which scale = TRUE.
Much other information can be accessed as attributes.
Author(s)
Jacob Long jacob.long@sc.edu
See Also
scale_mod() can simply perform the standardization if
preferred.
gscale() does the heavy lifting for mean-centering and scaling
behind the scenes.
Other summ:
summ.glm(),
summ.lm(),
summ.merMod(),
summ.rq()
Examples
if (requireNamespace("survey")) {
library(survey)
data(api)
dstrat <- svydesign(id = ~1, strata =~ stype, weights =~ pw,
data = apistrat, fpc =~ fpc)
regmodel <- svyglm(api00 ~ ell * meals, design = dstrat)
summ(regmodel)
}
Calculate Pearson correlations with complex survey data
Description
svycor extends the survey package by calculating correlations
with syntax similar to the original package, which for reasons unknown lacks
such a function.
Usage
svycor(
formula,
design,
na.rm = FALSE,
digits = getOption("jtools-digits", default = 2),
sig.stats = FALSE,
bootn = 1000,
mean1 = TRUE,
...
)
Arguments
formula
A formula (e.g., ~var1+var2) specifying the terms to correlate.
design
The survey.design or svyrep.design object.
na.rm
Logical. Should cases with missing values be dropped?
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 2. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired number.
sig.stats
Logical. Perform non-parametric bootstrapping
(using wtd.cor ) to generate standard errors and
associated t- and p-values. See details for some considerations when doing
null hypothesis testing with complex survey correlations.
bootn
If sig.stats is TRUE, this defines the number of
bootstraps to be run to generate the standard errors and p-values. For
large values and large datasets, this can contribute considerably to
processing time.
mean1
If sig.stats is TRUE, it is important to know whether the
sampling weights should have a mean of 1. That is, should the standard
errors be calculated as if the number of rows in your dataset is the total
number of observations (TRUE) or as if the sum of the weights in your
dataset is the total number of observations (FALSE)?
...
Additional arguments passed to svyvar() .
Details
This function extends the survey package by calculating the
correlations for user-specified variables in survey design and returning a
correlation matrix.
Using the wtd.cor function, this function also
returns standard errors and p-values for the correlation terms using a
sample-weighted bootstrapping procedure. While correlations do not require
distributional assumptions, hypothesis testing (i.e., r > 0) does.
The appropriate way to calculate standard errors and use them to define a
probability is not straightforward in this scenario since the weighting
causes heteroskedasticity, thereby violating
an assumption inherent in the commonly used methods for converting Pearson's
correlations into t-values. The method provided here is defensible, but if
reporting in scientific publications the method should be spelled out.
Value
If significance tests are not requested, there is one returned value:
cors
The correlation matrix (without rounding)
If significance tests are requested, the following are also returned:
p.values
A matrix of p values
t.values
A matrix of t values
std.err
A matrix of standard errors
Note
This function was designed in part on the procedure recommended by Thomas Lumley, the author of the survey package, on Stack Overflow. However, he has not reviewed or endorsed this implementation. All defects are attributed to the author.
Author(s)
Jacob Long jacob.long@sc.edu
See Also
Other survey package extensions:
svysd()
Other survey tools:
pf_sv_test(),
svysd(),
weights_tests(),
wgttest()
Examples
if (requireNamespace("survey")) {
library(survey)
data(api)
# Create survey design object
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc = ~fpc)
# Print correlation matrix
svycor(~api00 + api99 + dnum, design = dstrat)
# Save the results, extract correlation matrix
out <- svycor(~api00 + api99 + dnum, design = dstrat)
out$cors
}
Calculate standard deviations with complex survey data
Description
svysd extends the survey package by calculating standard
deviations with syntax similar to the original package, which provides
only a svyvar() function.
Usage
svysd(
formula,
design,
na.rm = FALSE,
digits = getOption("jtools-digits", default = 3),
...
)
Arguments
formula
A formula (e.g., ~var1+var2) specifying the term(s) of interest.
design
The survey.design or svyrep.design object.
na.rm
Logical. Should cases with missing values be dropped?
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 3. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired number.
...
Additional arguments passed to svyvar() .
Details
An alternative is to simply do sqrt(svyvar(~term, design = design)).
However, if printing and sharing the output, this may be misleading since
the output will say "variance."
Note
This function was designed independent of the survey package and is neither endorsed nor known to its authors.
See Also
Other survey package extensions:
svycor()
Other survey tools:
pf_sv_test(),
svycor(),
weights_tests(),
wgttest()
Examples
if (requireNamespace("survey")) {
library(survey)
data(api)
# Create survey design object
dstrat <- svydesign(id = ~1,strata = ~stype, weights = ~pw, data = apistrat,
fpc=~fpc)
# Print the standard deviation of some variables
svysd(~api00+ell+meals, design = dstrat)
}
Format ggplot2 figures in APA style
Description
theme_apa() is designed to work like any other complete theme from
ggplot . To the extent possible, it aligns with
the (vague) APA figure guidelines.
Usage
theme_apa(
legend.pos = "right",
legend.use.title = FALSE,
legend.font.size = 12,
x.font.size = 12,
y.font.size = 12,
facet.title.size = 12,
remove.y.gridlines = TRUE,
remove.x.gridlines = TRUE
)
Arguments
legend.pos
One of "right", "left", "top", "bottom",
"topleft", "topright", "topmiddle", "bottomleft",
"bottomright", or "bottommiddle".
Positions the legend, which will layer on top of any geoms, on the plane.
legend.use.title
Logical. Specify whether to include a legend title.
Defaults to FALSE.
legend.font.size
Integer indicating the font size of the labels in the legend. Default and APA-recommended is 12, but if there are many labels it may be necessary to choose a smaller size.
x.font.size
Font size of x-axis label.
y.font.size
Font size of x-axis label.
facet.title.size
Font size of facet labels.
remove.y.gridlines
Should the coordinate grid on the y-axis (horizontal lines) be removed? Default is TRUE.
remove.x.gridlines
Should the coordinate grid on the x-axis (vertical lines) be removed? Default is TRUE.
Details
This function applies a theme to ggplot2 figures with a
style that is roughly in line with APA guidelines. Users may need to
perform further operations for their specific use cases.
There are some things to keep in mind about APA style figures:
Main titles should be written in the word processor or typesetter rather than on the plot image itself.
In some cases, users can forgo a legend in favor of describing the figure in a caption (also written in the word processor/typesetter).
Legends are typically embedded on the coordinate plane of the figure rather than next to it, as is default in
ggplot2.Use of color is generally discouraged since most of the applications for which APA figures are needed involve eventual publication in non-color print media.
There are no hard and fast rules on font size, though APA recommends choosing between 8 and 14-point. Fonts in figures should be sans serif.
Because APA style calls for positioning legends on the plane itself, this
function includes options for choosing a position–top left, top right, bottom
left, bottom right–to place the legend. ggplot2 provides no obvious
way to automatically choose a position that overlaps least with the geoms (the
plotted data), so users will need to choose one.
Facetting is supported, but APA guidelines are considerably less clear for such situations.
This theme was created with inspiration from Rudolf Cardinal's
code,
which required updating for newer versions of ggplot2 and
adaptations for APA style.
Author(s)
Jacob Long jacob.long@sc.edu
References
American Psychological Association. (2010). Publication manual of the American Psychological Association, Sixth Edition. Washington, DC: American Psychological Association.
Nicol, A.A.M. & Pexman, P.M. (2010). Displaying your findings: A practical guide for creating figures, posters, and presentations, Sixth Edition. Washington, D.C.: American Psychological Association.
See Also
Examples
# Create plot with ggplot2
library(ggplot2)
plot <- ggplot(mpg, aes(cty, hwy)) +
geom_jitter()
# Add APA theme with defaults
plot + theme_apa()
A nice, flexible ggplot2 theme
Description
theme_nice is designed to work like any other complete theme from
ggplot . It has a nice appearance.
Usage
theme_nice(
legend.pos = "right",
style = c("white", "light", "dark_blue", "dark_gray"),
base_size = 11,
base_family = "",
base_line_size = base_size/22,
base_rect_size = base_size/22
)
Arguments
legend.pos
One of "right", "left", "top", "bottom" (outside
the plotting area), "topleft", "topright", "topmiddle",
"bottomleft", "bottomright", or "bottommiddle" (inside the plotting
area).
style
One of "white", "light", "dark_blue", or "dark_gray".
"white" sets the background to white, "light" to light gray,
"dark_gray" to dark gray, "dark_blue" to dark blue.
base_size
base font size, given in pts.
base_family
base font family
base_line_size
base size for line elements
base_rect_size
base size for rect elements
Author(s)
Jacob Long jacob.long@sc.edu
Examples
# Create plot with ggplot2
library(ggplot2)
plot <- ggplot(mpg, aes(cty, hwy)) +
geom_jitter() + theme_nice()
Broom extensions for summ objects
Description
These are functions used for compatibility with broom's tidying
functions to facilitate use with huxreg, thereby making
export_summs works.
Usage
## S3 method for class 'summ'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'summ.merMod'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'summ.lm'
glance(x, ...)
## S3 method for class 'summ.glm'
glance(x, ...)
## S3 method for class 'summ.svyglm'
glance(x, ...)
## S3 method for class 'summ.merMod'
glance(x, ...)
## S3 method for class 'summ.rq'
glance(x, ...)
Arguments
x
The summ object.
conf.int
Include confidence intervals? Default is FALSE.
conf.level
How wide confidence intervals should be, if requested. Default is .95.
...
Other arguments (usually ignored)
Value
A data.frame with columns matching those appropriate for the model
type per glance documentation.
See Also
Test whether sampling weights are needed
Description
Use the tests proposed in Pfeffermann and Sverchkov (1999) and DuMouchel and Duncan (1983) to check whether a regression model is specified correctly without weights.
Usage
weights_tests(
model,
weights,
data,
model_output = TRUE,
test = NULL,
sims = 1000,
digits = getOption("jtools-digits", default = 2)
)
Arguments
model
The fitted model, without weights
weights
The name of the weights column in model's data frame
or a vector of weights equal in length to the number of observations
included in model.
data
The data frame with the data fed to the fitted model and the weights
model_output
Should a summary of the model with weights as predictor be printed? Default is TRUE, but you may not want it if you are trying to declutter a document.
test
Which type of test should be used in the ANOVA? The default,
NULL, chooses based on the model type ("F" for linear models).
This argument is passed to anova.
sims
The number of bootstrap simulations to use in estimating the variance of the residual correlation. Default is 1000, but for publications or when computing power/time is sufficient, a higher number is better.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 3. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired number.
Details
This function is a wrapper for the two tests implemented in this package that
test whether your regression model is correctly specified. The first is
wgttest , an R adaptation of the Stata macro of the same name.
This test can otherwise be referred to as the DuMouchel-Duncan test. The
other test is the Pfeffermann-Sverchkov test, which can be accessed directly
with pf_sv_test .
For more details on each, visit the documentation on the respective functions. This function just runs each of them for you.
References
DuMouchel, W. H. & Duncan, D.J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. Journal of the American Statistical Association, 78. 535-543.
Nordberg, L. (1989). Generalized linear modeling of sample survey data. Journal of Official Statistics; Stockholm, 5, 223-239.
Pfeffermann, D., & Sverchkov, M. (1999). Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhya: The Indian Journal of Statistics, 61. 166-186.
See Also
Other survey tools:
pf_sv_test(),
svycor(),
svysd(),
wgttest()
Examples
# Note: This is a contrived example to show how the function works,
# not a case with actual sammpling weights from a survey vendor
if (requireNamespace("boot")) {
states <- as.data.frame(state.x77)
set.seed(100)
states$wts <- runif(50, 0, 3)
fit <- lm(Murder ~ Illiteracy + Frost, data = states)
weights_tests(model = fit, data = states, weights = wts, sims = 100)
}
Test whether sampling weights are needed
Description
Use the DuMouchel-Duncan (1983) test to assess the need for sampling weights in your linear regression analysis.
Usage
wgttest(
model,
weights,
data = NULL,
model_output = FALSE,
test = NULL,
digits = getOption("jtools-digits", default = 3)
)
Arguments
model
The unweighted linear model (must be lm,
glm, see details for other types) you want to check.
weights
The name of the weights column in model's data frame
or a vector of weights equal in length to the number of observations
included in model.
data
The data frame with the data fed to the fitted model and the weights
model_output
Should a summary of the model with weights as predictor be printed? Default is FALSE since the output can be very long for complex models.
test
Which type of test should be used in the ANOVA? The default,
NULL, chooses based on the model type ("F" for linear models).
This argument is passed to anova.
digits
An integer specifying the number of digits past the decimal to
report in the output. Default is 3. You can change the default number of
digits for all jtools functions with
options("jtools-digits" = digits) where digits is the desired number.
Details
This is designed to be similar to the wgttest macro for Stata
(http://fmwww.bc.edu/repec/bocode/w/wgttest.html). This method,
advocated for by DuMouchel and Duncan (1983), is fairly straightforward. To
decide whether weights are needed, the weights are added to the linear model
as a predictor and interaction with each other predictor. Then, an omnibus
test of significance is performed to compare the weights-added model to the
original; if insignificant, weights are not significantly related to the
result and you can use the more efficient estimation from unweighted OLS.
It can be helpful to look at the created model using
model_output = TRUE
to see which variables might be the ones affected by inclusion of weights.
This test can support most GLMs in addition to LMs, a use validated by
Nordberg (1989). This, to my knowledge, is different from the Stata macro.
It does not work for mixed models (e.g., lmer or lme) though
it could plausibly be
implemented. However, there is no scholarly consensus how to properly
incorporate weights into mixed models. There are other types of models that
may work, but have not been tested. The function is designed to be
compatible with as many model types as possible, but the user should be
careful to make sure s/he understands whether this type of test is
appropriate for the model being considered. DuMouchel and Duncan (1983) were
only thinking about linear regression when the test was conceived.
Nordberg (1989) validated its use with generalized linear models, but to
this author's knowledge it has not been tested with other model types.
References
DuMouchel, W. H. & Duncan, D.J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. Journal of the American Statistical Association, 78. 535-543.
Nordberg, L. (1989). Generalized linear modeling of sample survey data. Journal of Official Statistics; Stockholm, 5, 223–239.
Winship, C. & Radbill, L. (1994). Sampling weights and regression analysis. Sociological Methods and Research, 23, 230-257.
See Also
Other survey tools:
pf_sv_test(),
svycor(),
svysd(),
weights_tests()
Examples
# First, let's create some fake sampling weights
wts <- runif(50, 0, 5)
# Create model
fit <- lm(Income ~ Frost + Illiteracy + Murder,
data = as.data.frame(state.x77))
# See if the weights change the model
wgttest(fit, weights = wts)
# With a GLM
wts <- runif(100, 0, 2)
x <- rnorm(100)
y <- rbinom(100, 1, .5)
fit <- glm(y ~ x, family = binomial)
wgttest(fit, wts)
## Can specify test manually
wgttest(fit, weights = wts, test = "Rao")
# Quasi family is treated differently than likelihood-based
## Dobson (1990) Page 93: Randomized Controlled Trial (plus some extra values):
counts <- c(18,17,15,20,10,20,25,13,12,18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,18)
treatment <- gl(3,6)
glm.D93 <- glm(counts ~ outcome + treatment, family = quasipoisson)
wts <- runif(18, 0, 3)
wgttest(glm.D93, weights = wts)
cat, message, warning, and stop wrapped to fit the console's
width.
Description
These are convenience functions that format printed output to fit the width of the user's console.
Usage
wrap_str(..., sep = "")
cat_wrap(..., brk = "")
warn_wrap(..., brk = "\n", class = NULL, call. = FALSE)
stop_wrap(
...,
brk = "\n",
trace = rlang::trace_back(bottom = rlang::caller_env()),
class = NULL,
call = rlang::caller_env(),
call. = FALSE
)
msg_wrap(..., class = NULL, brk = "\n")
Arguments
...
Objects to print. For stop_wrap(), warn_wrap(), and
msg_wrap(), any named objects are instead diverted to the ... argument
of rlang::abort() , rlang::warn() , and rlang::inform() ,
respectively.
sep
Separator between ..., Default: ”
brk
What should the last character of the message/warning/error be?
Default is "\n", meaning the console output ends with a new line.
class
Subclass of the condition.
call.
Here for legacy reasons. It is ignored.
trace
A trace object created by trace_back() .
call
The actual calling environment to report in the error message.
By default, rlang::caller_env().
Details
The point of these functions is to allow you to print
output/messages/warnings/errors to the console without having to figure out
where to place newline characters. These functions get the width of the
console from the "width" option, which in most editors adjusts dynamically
as you resize.
So instead of writing a warning like this:
warning("I have to give you this very important message that may be too\n",
"wide for your screen")
You can do it like this:
warn_wrap("I have to give you this very important message that may be
too wide for your screen")
And the function will automatically insert line breaks to fit the console. As a note, it will also ignore any newlines you insert. This means you can make your own fit your editor's screen and indent in the middle of a string without that formatting being carried over into the output.
Weighted standard deviation calculation
Description
This function calculates standard deviations with weights and
is a counterpart to the built-in weighted.mean function.
Usage
wtd.sd(x, weights)
Arguments
x
A vector of values for which you want the standard deviation
weights
A vector of weights equal in length to x