Note
Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder
IsolationForest example#
An example using IsolationForest
for anomaly
detection.
The Isolation Forest is an ensemble of "Isolation Trees" that "isolate" observations by recursive random partitioning, which can be represented by a tree structure. The number of splittings required to isolate a sample is lower for outliers and higher for inliers.
In the present example we demo two ways to visualize the decision boundary of an Isolation Forest trained on a toy dataset.
# Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause
Data generation#
We generate two clusters (each one containing n_samples
) by randomly
sampling the standard normal distribution as returned by
numpy.random.randn
. One of them is spherical and the other one is
slightly deformed.
For consistency with the IsolationForest
notation,
the inliers (i.e. the gaussian clusters) are assigned a ground truth label 1
whereas the outliers (created with numpy.random.uniform
) are assigned
the label -1
.
importnumpyasnp fromsklearn.model_selectionimport train_test_split n_samples, n_outliers = 120, 40 rng = np.random.RandomState (0) covariance = np.array ([[0.5, -0.1], [0.7, 0.4]]) cluster_1 = 0.4 * rng.randn(n_samples, 2) @ covariance + np.array ([2, 2]) # general cluster_2 = 0.3 * rng.randn(n_samples, 2) + np.array ([-2, -2]) # spherical outliers = rng.uniform(low=-4, high=4, size=(n_outliers, 2)) X = np.concatenate ([cluster_1, cluster_2, outliers]) y = np.concatenate ( [np.ones ((2 * n_samples), dtype=int), -np.ones ((n_outliers), dtype=int)] ) X_train, X_test, y_train, y_test = train_test_split (X, y, stratify=y, random_state=42)
We can visualize the resulting clusters:
importmatplotlib.pyplotasplt scatter = plt.scatter (X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") handles, labels = scatter.legend_elements() plt.axis ("square") plt.legend (handles=handles, labels=["outliers", "inliers"], title="true class") plt.title ("Gaussian inliers with \nuniformly distributed outliers") plt.show ()
Training of the model#
fromsklearn.ensembleimport IsolationForest clf = IsolationForest (max_samples=100, random_state=0) clf.fit(X_train)
IsolationForest(max_samples=100, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.