skmultilearn.problem_transform.ClassifierChain(classifier=None, require_dense=None, order=None)[source] ¶ Bases: skmultilearn.base.problem_transformation.ProblemTransformationBase
Constructs a bayesian conditioned chain of per label classifiers
This class provides implementation of Jesse Read’s problem transformation method called Classifier Chains. For L labels it trains L classifiers ordered in a chain according to the Bayesian chain rule.
The first classifier is trained just on the input space, and then each next classifier is trained on the input space and all previous classifiers in the chain.
The default classifier chains follow the same ordering as provided in the training set, i.e. label in column 0, then 1, etc.
| Parameters: |
|
|---|
classifiers_¶ list of classifiers trained per partition, set in fit()
| Type: | List[BaseEstimator] of shape n_labels |
|---|
References
If used, please cite the scikit-multilearn library and the relevant paper:
@inproceedings{read2009classifier, title={Classifier chains for multi-label classification}, author={Read, Jesse and Pfahringer, Bernhard and Holmes, Geoff and Frank, Eibe}, booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, pages={254--269}, year={2009}, organization={Springer} }
Examples
An example use case for Classifier Chains
with an sklearn.svm.SVC base classifier which supports sparse input:
from skmultilearn.problem_transform import ClassifierChain from sklearn.svm import SVC # initialize Classifier Chain multi-label classifier # with an SVM classifier # SVM in scikit only supports the X matrix in sparse representation classifier = ClassifierChain( classifier = SVC(), require_dense = [False, True] ) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
Another way to use this classifier is to select the best scenario from a set of single-label classifiers used
with Classifier Chain, this can be done using cross validation grid search. In the example below, the model
with highest accuracy results is selected from either a sklearn.naive_bayes.MultinomialNB or
sklearn.svm.SVC base classifier, alongside with best parameters for that base classifier.
from skmultilearn.problem_transform import ClassifierChain from sklearn.model_selection import GridSearchCV from sklearn.naive_bayes import MultinomialNB from sklearn.svm import SVC parameters = [ { 'classifier': [MultinomialNB()], 'classifier__alpha': [0.7, 1.0], }, { 'classifier': [SVC()], 'classifier__kernel': ['rbf', 'linear'], }, ] clf = GridSearchCV(ClassifierChain(), parameters, scoring='accuracy') clf.fit(x, y) print (clf.best_params_, clf.best_score_) # result # {'classifier': MultinomialNB(alpha=0.7, class_prior=None, fit_prior=True), 'classifier__alpha': 0.7} 0.16
fit(X, y, order=None)[source] ¶ Fits classifier to training data
| Parameters: |
|
|---|---|
| Returns: | fitted instance of self |
| Return type: | self |
Notes
Note
Input matrices are converted to sparse format internally if a numpy representation is passed
predict(X)[source] ¶ Predict labels for X
| Parameters: | X (array_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)) – input feature matrix |
|---|---|
| Returns: | binary indicator matrix with label assignments |
| Return type: | scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels) |
predict_proba(X)[source] ¶ Predict probabilities of label assignments for X
| Parameters: | X (array_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)) – input feature matrix |
|---|---|
| Returns: | matrix with label assignment probabilities |
| Return type: | scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels) |