Note
Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder.
Visualizing the probabilistic predictions of a VotingClassifier#
Plot the predicted class probabilities in a toy dataset predicted by three
different classifiers and averaged by the VotingClassifier.
First, three linear classifiers are initialized. Two are spline models with
interaction terms, one using constant extrapolation and the other using periodic
extrapolation. The third classifier is a Nystroem
with the default "rbf" kernel.
In the first part of this example, these three classifiers are used to
demonstrate soft-voting using VotingClassifier with weighted
average. We set weights=[2, 1, 3], meaning the constant extrapolation spline
model’s predictions are weighted twice as much as the periodic spline model’s,
and the Nystroem model’s predictions are weighted three times as much as the
periodic spline.
The second part demonstrates how soft predictions can be converted into hard predictions.
# Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause
We first generate a noisy XOR dataset, which is a binary classification task.
importmatplotlib.pyplotasplt importnumpyasnp importpandasaspd frommatplotlib.colorsimport ListedColormap n_samples = 500 rng = np.random.default_rng (0) feature_names = ["Feature #0", "Feature #1"] common_scatter_plot_params = dict( cmap=ListedColormap (["tab:red", "tab:blue"]), edgecolor="white", linewidth=1, ) xor = pd.DataFrame ( np.random.RandomState (0).uniform(low=-1, high=1, size=(n_samples, 2)), columns=feature_names, ) noise = rng.normal(loc=0, scale=0.1, size=(n_samples, 2)) target_xor = np.logical_xor ( xor["Feature #0"] + noise[:, 0] > 0, xor["Feature #1"] + noise[:, 1] > 0 ) X = xor[feature_names] y = target_xor.astype(np.int32 ) fig, ax = plt.subplots () ax.scatter(X["Feature #0"], X["Feature #1"], c=y, **common_scatter_plot_params) ax.set_title("The XOR dataset") plt.show ()
Due to the inherent non-linear separability of the XOR dataset, tree-based models would often be preferred. However, appropriate feature engineering combined with a linear model can yield effective results, with the added benefit of producing better-calibrated probabilities for samples located in the transition regions affected by noise.
We define and fit the models on the whole dataset.
fromsklearn.ensembleimport VotingClassifier fromsklearn.kernel_approximationimport Nystroem fromsklearn.linear_modelimport LogisticRegression fromsklearn.pipelineimport make_pipeline fromsklearn.preprocessingimport PolynomialFeatures , SplineTransformer , StandardScaler clf1 = make_pipeline ( SplineTransformer (degree=2, n_knots=2), PolynomialFeatures (interaction_only=True), LogisticRegression (C=10), ) clf2 = make_pipeline ( SplineTransformer ( degree=2, n_knots=4, extrapolation="periodic", include_bias=True, ), PolynomialFeatures (interaction_only=True), LogisticRegression (C=10), ) clf3 = make_pipeline ( StandardScaler (), Nystroem (gamma=2, random_state=0), LogisticRegression (C=10), ) weights = [2, 1, 3] eclf = VotingClassifier ( estimators=[ ("constant splines model", clf1), ("periodic splines model", clf2), ("nystroem model", clf3), ], voting="soft", weights=weights, ) clf1.fit(X, y) clf2.fit(X, y) clf3.fit(X, y) eclf.fit(X, y)
VotingClassifier(estimators=[('constant splines model',
Pipeline(steps=[('splinetransformer',
SplineTransformer(degree=2,
n_knots=2)),
('polynomialfeatures',
PolynomialFeatures(interaction_only=True)),
('logisticregression',
LogisticRegression(C=10))])),
('periodic splines model',
Pipeline(steps=[('splinetransformer',
SplineTransformer(degree=2,
extrapolation='periodic',
n_knots=4)),
('polynomialfeatures',
PolynomialFeatures(interaction_only=True)),
('logisticregression',
LogisticRegression(C=10))])),
('nystroem model',
Pipeline(steps=[('standardscaler',
StandardScaler()),
('nystroem',
Nystroem(gamma=2,
random_state=0)),
('logisticregression',
LogisticRegression(C=10))]))],
voting='soft', weights=[2, 1, 3])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones
of those original estimators that will be stored in the class attribute
``self.estimators_``. An estimator can be set to ``'drop'`` using
:meth:`set_params`.
.. versionchanged:: 0.21
``'drop'`` is accepted. Using None was deprecated in 0.22 and
support was removed in 0.24. [('constant splines model', ...), ('periodic splines model', ...), ...]