Multilabel k Nearest Neighbours

Multilabel k Nearest Neighbours¶

class skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0)[source] ¶

kNN classification method adapted for multi-label classification

MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.

Parameters:	k (int) – number of neighbours of each input instance to take into account s (float (default is 1.0)) – the smoothing parameter ignore_first_neighbours (int (default is 0)) – ability to ignore first N neighbours, useful for comparing with other classification software.

knn_¶

the nearest neighbors single-label classifier used underneath

Type:	an instance of sklearn.NearestNeighbors

Note

If you don’t know what ignore_first_neighbours does, the default is safe. Please see this issue.

References

If you use this classifier please cite the original paper introducing the method:

@article{zhang2007ml,
 title={ML-KNN: A lazy learning approach to multi-label learning},
 author={Zhang, Min-Ling and Zhou, Zhi-Hua},
 journal={Pattern recognition},
 volume={40},
 number={7},
 pages={2038--2048},
 year={2007},
 publisher={Elsevier}
}

Examples

Here’s a very simple example of using MLkNN with a fixed number of neighbors:

from skmultilearn.adapt import MLkNN
classifier = MLkNN(k=3)
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV
parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1_macro'
clf = GridSearchCV(MLkNN(), parameters, scoring=score)
clf.fit(X, y)
print (clf.best_params_, clf.best_score_)
# output
({'k': 1, 's': 0.5}, 0.78988303374297597)

fit(X, y)[source] ¶

Fit classifier with training data

Parameters:	X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size `(n_samples, n_features)` y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns:	fitted instance of self
Return type:	self

get_params(deep=True)¶

Get parameters to sub-objects

Introspection of classifier for search models like cross-validation and grid search.

Parameters:	deep (bool) – if `True` all params will be introspected also and appended to the output dictionary.
Returns:	out – dictionary of all parameters and their values. If `deep=True` the dictionary also holds the parameters of the parameters.
Return type:	dict

predict(X)[source] ¶

Predict labels for X

Parameters:	X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape `(n_samples, n_features)`
Returns:	binary indicator matrix with label assignments with shape `(n_samples, n_labels)`
Return type:	scipy.sparse matrix of int

predict_proba(X)[source] ¶

Predict probabilities of label assignments for X

Parameters:	X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape `(n_samples, n_features)`
Returns:	binary indicator matrix with label assignment probabilities with shape `(n_samples, n_labels)`
Return type:	scipy.sparse matrix of int

score(X, y, sample_weight=None)¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X (array-like, shape = (n_samples, n_features)) – Test samples. y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X. sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns:	score – Mean accuracy of self.predict(X) wrt. y.
Return type:	float

set_params(**parameters)¶

Propagate parameters to sub-objects

Set parameters as returned by get_params. Please see this link.

Cite US!

If you use scikit-multilearn in your research and publish it, please consider citing us, it will help us get funding for making the library better. The paper is available on arXiv, to cite it try the Bibtex code on the right.


 
 @ARTICLE{2017arXiv170201460S,
 author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
 title = "{A scikit-based Python environment for performing multi-label classification}",
 journal = {ArXiv e-prints},
 archivePrefix = "arXiv",
 eprint = {1702.01460},
 primaryClass = "cs.LG",
 keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
 year = 2017,
 month = feb,
 }

Created using Sphinx 1.8.2. Show this page source