shinymodels: Interactive Assessments of Models
Description
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Author(s)
Maintainer: Simon Couch simon.couch@posit.co (ORCID)
Authors:
Max Kuhn max@posit.co (ORCID)
Shisham Adhikari shadhikari@ucdavis.edu
Julia Silge julia.silge@posit.co (ORCID)
Other contributors:
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidymodels/shinymodels/issues
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs
A value or the magrittr placeholder.
rhs
A function call using the magrittr semantics.
Value
The result of calling rhs(lhs).
Iterative optimization of neural network
Description
This object has the results when a neural network was tuned using Bayesian optimization and a validation set.
Details
The code used to produce this object:
data(ames)
ames <-
ames %>%
select(Sale_Price, Neighborhood, Longitude, Latitude, Year_Built) %>%
mutate(Sale_Price = log10(ames$Sale_Price))
set.seed(1)
ames_rs <- validation_split(ames)
ames_rec <-
recipe(Sale_Price ~ ., data = ames) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
mlp_spec <-
mlp(hidden_units = tune(),
penalty = tune(),
epochs = tune()) %>%
set_mode("regression")
set.seed(1)
ames_mlp_itr <-
mlp_spec %>%
tune_bayes(
ames_rec,
resamples = ames_rs,
initial = 5,
iter = 4,
control = control_bayes(save_pred = TRUE)
)
Value
An object with primary class iteration_results.
Resampled bagged tree results
Description
This object has the results when a bagged regression tree was resampled using 10-fold cross-validation.
Details
The code used to produce this object:
library(tidymodels)
library(baguette)
tidymodels_prefer()
# ------------------------------------------------------------------------------
ctrl_rs <- control_resamples(save_pred = TRUE)
# ------------------------------------------------------------------------------
set.seed(1)
cars_rs <- vfold_cv(mtcars)
cars_bag_vfld <-
bag_tree() %>%
set_engine("rpart", times = 5) %>%
set_mode("regression") %>%
fit_resamples(
mpg ~ .,
resamples = cars_rs,
control = ctrl_rs
)
Value
An object with primary class resample_results.
A CART classification tree tuned via racing
Description
This object has the results when a CART classification tree model was tuned over the cost-complexity parameter using racing.
Details
To reduce the object size, a smaller subset of the data were used.
The code used to produce this object:
library(tidymodels)
library(finetune)
tidymodels_prefer()
ctrl_rc <- control_race(save_pred = TRUE)
# ------------------------------------------------------------------------------
data(cells)
set.seed(1)
cells <-
cells %>%
select(-case) %>%
sample_n(200)
# ------------------------------------------------------------------------------
set.seed(2)
cell_rs <- vfold_cv(cells)
# ------------------------------------------------------------------------------
set.seed(3)
cell_race <-
decision_tree(cost_complexity = tune()) %>%
set_mode("classification") %>%
tune_race_anova(
class ~ .,
resamples = cell_rs,
grid = tibble(cost_complexity = 10^seq(-2, -1, by = 0.2)),
control = ctrl_rc
)
Value
An object with primary class tune_race.
Gets the config and translate to a sentence with the parameter values
Description
This function takes result of organize_data, predictions across all models, and the names of the tuning parameters to return a sentence with the default parameter values.
Usage
display_selected(x, performance, predictions, tuning_param, input)
Arguments
x
The organize_data() result.
performance
The dataframe with performance metrics for each candidate model.
predictions
The dataframe with predictions across all models.
tuning_param
The names of the tuning parameters.
input
The DT::datatable object.
Value
A sentence.
Explore model results
Description
explore() launches a Shiny application to interact with results from some
tidymodels functions.
To investigate model fit(s), explore() can be used on objects produced by
The application starts in a new window and allows users to see how predicted values align with the true, observed data. There are 2-3 tabs in the application (depending on the object):
-
Tuning Parameters enables users to choose a specific set of tuning parameters. These results are shown in the Plots tab. The default configuration is based on the optimal value of the first performance metric used during the creation of the object.
-
Plots shows various panels that can visualize how well the model fits. Specific points can be highlighted by clicking on them (as long as the
hover_only = FALSEoption was used). To reset the highlighted points, double on the graph background. -
About gives information on the application as well as links to get help or file bug reports/feature requests.
To quit the Shiny application, use the Esc key.
Usage
## Default S3 method:
explore(x, ...)
## S3 method for class 'tune_results'
explore(x, hover_cols = NULL, hover_only = FALSE, ...)
Arguments
x
An object with class tune_results.
...
Other parameters not currently used.
hover_cols
The columns to display while hovering in the Shiny app. This argument can be:
A
dplyrselector (such asdplyr::starts_with()) or a set of selector if they are enclosed with inc().A character vector.
hover_only
A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets.
Details
For resampling methods that produce more than one hold-out prediction per row (e.g. the bootstrap, repeated V-fold cross-validation), the predicted values shown in the plots are averages of the predictions for that specific row.
The ggplot2 theme used in the Shiny application corresponds to the current
theme in the R session. Run ggplot2::theme_set() to change the theme for
the plots in the Shiny application.
For classification models, there is a toggle on the bottom left of
the application to choose between "Unscaled (i.e. linear)" and
"Logit scaled" probability scaling. The first options plots the raw
probabilities while the logit scaling uses scales::logit_trans() to rescale
the axis. This can be helpful when a model with a linear predictor is used
(e.g. logistic or multinomial regression) since it can show linear effects
from a feature more easily.
When using the application, there may be warnings printed in the console about "event tied a source ID ... not registered". These can be ignored.
When racing results are explored, the shiny application will only allow tuning parameter combinations that were fully resampled. As a result, parameter combinations that were discarded during the race will not be able to be selected.
Value
A shiny application.
Examples
data(ames_mlp_itr)
if (interactive()) {
explore(ames_mlp_itr, hover_cols = dplyr::contains("tude"))
}
Returns the name of predictions column for the first level variable
Description
This function takes prediction data, the event level, and the outcome name as arguments and returns the predictions column for the first level variable.
Usage
first_class_prob_name(dat, event_level, y_name)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
event_level
A single character value for the level corresponding to the event.
y_name
The y/response variable for the model.
Value
A symbol.
Returns the first level of a classification model
Description
This function takes data, event_level and y_name, as arguments and
returns the first level in a classification data.
Usage
first_level(dat, event_level = c("first", "second"), y_name)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
event_level
A single character value for the level corresponding to the event.
y_name
The y/response variable for the model.
Value
A string.
Returns the hover columns to be displayed in interactive plots
Description
This function takes .hover argument and returns the output that can
be used as a test aesthetics in a ggplot2::ggplot() object to customize tooltip.
Usage
format_hover(x, ...)
Arguments
x
A data frame with columns to be displayed in the hover.
...
Arguments passed to format() to the column(s) selected to be seen
in the hover/tooltip.
Value
A character vector.
Extract data from objects to use in a shiny app
Description
This function joins the result of tune::fit_resamples() to the original
dataset to give a list that can be an input for the Shiny app.
Usage
organize_data(x, hover_cols = NULL, ...)
## Default S3 method:
organize_data(x, hover_cols = NULL, ...)
## S3 method for class 'tune_results'
organize_data(x, hover_cols = NULL, ...)
Arguments
x
The tune::fit_resamples() result.
hover_cols
The columns to display while hovering.
...
Other parameters not currently used.
Details
The default configuration is based on the optimal value of the first metric.
Value
A list with elements data frame and character vectors. The data frame includes
an outcome variable .outcome, a prediction variable .pred, model
configuration variable .config, and hovering columns .hover.
This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.
Description
This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.
Usage
performance_object(x)
Arguments
x
The organize_data() result.
Value
A dataframe.
Visualizing the confusion matrix for a classification model
Description
This function plots the confusion matrix for a classification model.
Usage
plot_multiclass_conf_mat(dat)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
Value
A plotly::ggplotly() object.
Visualizing predicted probability vs. true class for a multi-class classification model
Description
This function plots the predicted probabilities against the observed class based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_obs_pred(dat, y_name, prob_bins = 0.05)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
prob_bins
The desired binwidth for histogram.
Value
A plotly::ggplotly() object.
Visualizing the PR curve for a classification model
Description
This function plots the full precision recall curve.
Usage
plot_multiclass_pr(dat, y_name)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
Value
A plotly::ggplotly() object.
Visualizing the predicted probabilities vs. a factor variable for a classification model
Description
This function plots the predicted probabilities against a factor column based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_pred_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
prob_eps = 0.001,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
factorcol
The factor column to plot against the predicted probabilities.
alpha
The opacity for the geom points.
size
The size for the geom points.
prob_scaling
The boolean to turn on or off the logit scale for probability.
prob_eps
A small numerical constant to prevent division by zero.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing the predicted probabilities vs. a numeric column for a classification model
Description
This function plots the predicted probabilities against a numeric column based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_pred_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
prob_eps = 0.001,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
numcol
The numerical column to plot against the predicted probabilities.
alpha
The opacity for the geom points.
size
The size for the geom points.
prob_scaling
The boolean to turn on or off the logit scale for probability.
prob_eps
A small numerical constant to prevent division by zero.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing the ROC curve for a classification model
Description
This function plots the ROC curve for a classification model.
Usage
plot_multiclass_roc(dat, y_name)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
Value
A plotly::ggplotly() object.
Visualizing observed vs. predicted values for a regression model
Description
This function plots the predicted values against the observed values based on tidymodels results for a regression model.
Usage
plot_numeric_obs_pred(dat, y_name, alpha = 1, size = 1, source = NULL)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
alpha
The opacity for the geom points.
size
The size for the geom points.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing residuals vs. a factor column for a regression model
Description
This function plots the residuals against a factor column based on tidymodels results for a regression model.
Usage
plot_numeric_res_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
factorcol
The factor column to plot against the residuals.
alpha
The opacity for the geom points.
size
The size for the geom points.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing residuals vs. a numeric column for a regression model
Description
This function plots the residuals against a numeric column based on tidymodels results for a regression model.
Usage
plot_numeric_res_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
numcol
The numerical column to plot against the residuals.
alpha
The opacity for the geom points.
size
The size for the geom points.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing residuals vs. predicted values for a regression model
Description
This function plots the predicted values against the residuals based on tidymodels results for a regression model.
Usage
plot_numeric_res_pred(dat, y_name, size = 1, source = NULL)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
size
The size for the geom points.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing the confusion matrix for a classification model
Description
This function plots the confusion matrix for a classification model.
Usage
plot_twoclass_conf_mat(dat)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
Value
A plotly::ggplotly() object.
Visualizing predicted probability vs. true class for a two-class classification model
Description
This function plots the predicted probabilities against the observed class based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_obs_pred(dat, y_name, event_level = "first", prob_bins = 0.05)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
event_level
A single character value for the level corresponding to the event.
prob_bins
The desired binwidth for histogram.
Value
A plotly::ggplotly() object.
Visualizing the PR curve for a classification model
Description
This function plots the full precision recall curve.
Usage
plot_twoclass_pr(dat, y_name, event_level = "first")
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
event_level
A single character value for the level corresponding to the event.
Value
A plotly::ggplotly() object.
Visualizing the predicted probabilities vs. a factor variable for a classification model
Description
This function plots the predicted probabilities against a factor column based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_pred_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
event_level = "first",
prob_eps = 0.001,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
factorcol
The factor column to plot against the predicted probabilities.
alpha
The opacity for the geom points.
size
The size for the geom points.
prob_scaling
The boolean to turn on or off the logit scale for probability.
event_level
A single character value for the level corresponding to the event.
prob_eps
A small numerical constant to prevent division by zero.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing the predicted probabilities vs. a numeric column for a classification model
Description
This function plots the predicted probabilities against a numeric column based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_pred_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
event_level = "first",
prob_eps = 0.001,
source = NULL
)
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
numcol
The numerical column to plot against the predicted probabilities.
alpha
The opacity for the geom points.
size
The size for the geom points.
prob_scaling
The boolean to turn on or off the logit scale for probability.
event_level
A single character value for the level corresponding to the event.
prob_eps
A small numerical constant to prevent division by zero.
source
A character string of length 1 that matches the source argument in event_data().
Value
A plotly::ggplotly() object.
Visualizing the ROC curve for a classification model
Description
This function plots the ROC curve for a classification model.
Usage
plot_twoclass_roc(dat, y_name, event_level = "first")
Arguments
dat
The predictions data frame in the organize_data() result. Following
variables are required: .outcome, .pred, .color, and .hover.
y_name
The y/response variable for the model.
event_level
A single character value for the level corresponding to the event.
Value
A plotly::ggplotly() object.
Returns the class, app type, y name, and the number of rows of an object of
shiny_data class
Description
This is a print method for a shiny_data class
Usage
## S3 method for class 'shiny_data'
print(x, ...)
Arguments
x
an object of class shiny_data
...
Other parameters not currently used.
Value
x invisibly.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
Tuned flexible discriminant analysis results
Description
This object has the results when a flexible discriminant analysis model was tuned over the interaction degree parameters.
Details
To reduce the object size, five bootstraps were used for resampling and missing data were removed.
The code used to produce this object:
library(tidymodels) library(discrim) tidymodels_prefer() # ------------------------------------------------------------------------------ ctrl_gr <- control_grid(save_pred = TRUE) # ------------------------------------------------------------------------------ data(scat) scat <- scat[complete.cases(scat), ] # ------------------------------------------------------------------------------ set.seed(1) scat_rs <- bootstraps(scat, times = 5) scat_fda_bt <- discrim_flexible(prod_degree = tune()) %>% tune_grid( Species ~ ., resamples = scat_rs, control = ctrl_gr )
Value
An object with primary class tune_results.
Internal function to run shiny application on an object of shiny_data class
Description
This function takes the organize_data() result to shiny_models a Shiny app.
Usage
shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)
## Default S3 method:
shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)
## S3 method for class 'multi_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
## S3 method for class 'reg_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
## S3 method for class 'two_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
Arguments
x
The organize_data() result.
hover_cols
The columns to display while hovering in the Shiny app. This argument can be:
A
dplyrselector (such asdplyr::starts_with()) or a set of selector if they are enclosed with inc().A character vector.
hover_only
A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets.
...
Other parameters not currently used.
Value
A shiny application.
Test set results for logistic regression
Description
This object has the results when a logistic regression model is fit to the training set and is evaluated on the test set.
Details
The code used to produce this object:
library(tidymodels) tidymodels_prefer() # ------------------------------------------------------------------------------ set.seed(1) data(two_class_dat) # ------------------------------------------------------------------------------ two_class_split <- initial_split(two_class_dat) # ------------------------------------------------------------------------------ glm_spec <- logistic_reg() two_class_final <- glm_spec %>% last_fit( Class ~ ., split = two_class_split )
Value
An object with primary class last_fit.