skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0)[source] ¶ kNN classification method adapted for multi-label classification
MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.
| Parameters: |
|---|
knn_¶ the nearest neighbors single-label classifier used underneath
| Type: | an instance of sklearn.NearestNeighbors |
|---|
Note
If you don’t know what ignore_first_neighbours
does, the default is safe. Please see this issue.
References
If you use this classifier please cite the original paper introducing the method:
@article{zhang2007ml, title={ML-KNN: A lazy learning approach to multi-label learning}, author={Zhang, Min-Ling and Zhou, Zhi-Hua}, journal={Pattern recognition}, volume={40}, number={7}, pages={2038--2048}, year={2007}, publisher={Elsevier} }
Examples
Here’s a very simple example of using MLkNN with a fixed number of neighbors:
from skmultilearn.adapt import MLkNN classifier = MLkNN(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use GridSearchCV to find an optimal set of parameters:
from skmultilearn.adapt import MLkNN from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]} score = 'f1_macro' clf = GridSearchCV(MLkNN(), parameters, scoring=score) clf.fit(X, y) print (clf.best_params_, clf.best_score_) # output ({'k': 1, 's': 0.5}, 0.78988303374297597)
fit(X, y)[source] ¶ Fit classifier with training data
| Parameters: |
|
|---|---|
| Returns: | fitted instance of self |
| Return type: | self |
get_params(deep=True)¶ Get parameters to sub-objects
Introspection of classifier for search models like cross-validation and grid search.
| Parameters: | deep (bool) – if True all params will be introspected also and
appended to the output dictionary. |
|---|---|
| Returns: | out – dictionary of all parameters and their values. If
deep=True the dictionary also holds the parameters
of the parameters. |
| Return type: | dict |
predict(X)[source] ¶ Predict labels for X
| Parameters: | X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features) |
|---|---|
| Returns: | binary indicator matrix with label assignments with shape
(n_samples, n_labels) |
| Return type: | scipy.sparse matrix of int |
predict_proba(X)[source] ¶ Predict probabilities of label assignments for X
| Parameters: | X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features) |
|---|---|
| Returns: | binary indicator matrix with label assignment probabilities
with shape (n_samples, n_labels) |
| Return type: | scipy.sparse matrix of int |
score(X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
| Parameters: |
|
|---|---|
| Returns: | score – Mean accuracy of self.predict(X) wrt. y. |
| Return type: |