Visualizing the probabilistic predictions of a VotingClassifier

Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Finally we use DecisionBoundaryDisplay to plot the predicted probabilities. By using a diverging colormap (such as "RdBu"), we can ensure that darker colors correspond to predict_proba close to either 0 or 1, and white corresponds to predict_proba of 0.5.

fromitertoolsimport product
fromsklearn.inspectionimport DecisionBoundaryDisplay
fig, axarr = plt.subplots (2, 2, sharex="col", sharey="row", figsize=(10, 8))
for idx, clf, title in zip(
 product ([0, 1], [0, 1]),
 [clf1, clf2, clf3, eclf],
 [
 "Splines with\nconstant extrapolation",
 "Splines with\nperiodic extrapolation",
 "RBF Nystroem",
 "Soft Voting",
 ],
):
 disp = DecisionBoundaryDisplay.from_estimator (
 clf,
 X,
 response_method="predict_proba",
 plot_method="pcolormesh",
 cmap="RdBu",
 alpha=0.8,
 ax=axarr[idx[0], idx[1]],
 )
 axarr[idx[0], idx[1]].scatter(
 X["Feature #0"],
 X["Feature #1"],
 c=y,
 **common_scatter_plot_params,
 )
 axarr[idx[0], idx[1]].set_title(title)
 fig.colorbar(disp.surface_, ax=axarr[idx[0], idx[1]], label="Probability estimate")
plt.show ()

Splines with constant extrapolation, Splines with periodic extrapolation, RBF Nystroem, Soft Voting

As a sanity check, we can verify for a given sample that the probability predicted by the VotingClassifier is indeed the weighted average of the individual classifiers’ soft-predictions.

In the case of binary classification such as in the present example, the predict_proba arrays contain the probability of belonging to class 0 (here in red) as the first entry, and the probability of belonging to class 1 (here in blue) as the second entry.

test_sample = pd.DataFrame ({"Feature #0": [-0.5], "Feature #1": [1.5]})
predict_probas = [est.predict_proba(test_sample).ravel() for est in eclf.estimators_]
for (est_name, _), est_probas in zip(eclf.estimators, predict_probas):
 print(f"{est_name}'s predicted probabilities: {est_probas}")

constant splines model's predicted probabilities: [0.11272662 0.88727338]
periodic splines model's predicted probabilities: [0.99726573 0.00273427]
nystroem model's predicted probabilities: [0.3185838 0.6814162]

print(
 "Weighted average of soft-predictions: "
 f"{np.dot (weights,predict_probas)/np.sum (weights)}"
)

Weighted average of soft-predictions: [0.3630784 0.6369216]

We can see that manual calculation of predicted probabilities above is equivalent to that produced by the VotingClassifier:

print(
 "Predicted probability of VotingClassifier: "
 f"{eclf.predict_proba(test_sample).ravel()}"
)

Predicted probability of VotingClassifier: [0.3630784 0.6369216]

To convert soft predictions into hard predictions when weights are provided, the weighted average predicted probabilities are computed for each class. Then, the final class label is then derived from the class label with the highest average probability, which corresponds to the default threshold at predict_proba=0.5 in the case of binary classification.

print(
 "Class with the highest weighted average of soft-predictions: "
 f"{np.argmax (np.dot (weights,predict_probas)/np.sum (weights))}"
)

Class with the highest weighted average of soft-predictions: 1

This is equivalent to the output of VotingClassifier’s predict method:

print(f"Predicted class of VotingClassifier: {eclf.predict(test_sample).ravel()}")

Predicted class of VotingClassifier: [1]

Soft votes can be thresholded as for any other probabilistic classifier. This allows you to set a threshold probability at which the positive class will be predicted, instead of simply selecting the class with the highest predicted probability.

fromsklearn.model_selectionimport FixedThresholdClassifier
eclf_other_threshold = FixedThresholdClassifier (
 eclf, threshold=0.7, response_method="predict_proba"
).fit(X, y)
print(
 "Predicted class of thresholded VotingClassifier: "
 f"{eclf_other_threshold.predict(test_sample)}"
)

Predicted class of thresholded VotingClassifier: [0]

Total running time of the script: (0 minutes 0.667 seconds)

Launch binder

Launch JupyterLite

Download Jupyter notebook: plot_voting_decision_regions.ipynb

Download Python source code: plot_voting_decision_regions.py

Download zipped: plot_voting_decision_regions.zip

Related examples

Plot classification probability

Polynomial and Spline interpolation

Comparison of Calibration of Classifiers

Examples of Using FrozenEstimator

Gallery generated by Sphinx-Gallery

Visualizing the probabilistic predictions of a VotingClassifier#

This Page