Label Powerset

Label Powerset¶

class skmultilearn.problem_transform.LabelPowerset(classifier=None, require_dense=None)[source] ¶

Bases: skmultilearn.base.problem_transformation.ProblemTransformationBase

Transform multi-label problem to a multi-class problem

Label Powerset is a problem transformation approach to multi-label classification that transforms a multi-label problem to a multi-class problem with 1 multi-class classifier trained on all unique label combinations found in the training data.

The method maps each combination to a unique combination id number, and performs multi-class classification using the classifier as multi-class classifier and combination ids as classes.

Parameters:

Parameters:	classifier (`BaseEstimator`) – scikit-learn compatible base classifier require_dense ([bool , bool ], optional) – whether the base classifier requires dense representations for input features and classes/labels matrices in fit/predict. If value not provided, sparse representations are used if base classifier is an instance of `skmultilearn.base.MLClassifierBase` and dense otherwise.

classifier (BaseEstimator) – scikit-learn compatible base classifier
require_dense ([bool , bool ], optional) – whether the base classifier requires dense representations for input features and classes/labels matrices in fit/predict. If value not provided, sparse representations are used if base classifier is an instance of skmultilearn.base.MLClassifierBase and dense otherwise.

unique_combinations_¶

mapping from label combination as string to label combination id transform:() via fit()

Type:	Dict[str, int]

reverse_combinations_¶

label combination id ordered list to list of label indexes for a given combination transform:() via fit()

Type:	List[List[int]]

Notes

Note

n_classes in this document denotes the number of unique label combinations present in the training y passed to fit(), in practice it is equal to len(self.unique_combinations)

Examples

An example use case for Label Powerset with an sklearn.ensemble.RandomForestClassifier base classifier which supports sparse input:

from skmultilearn.problem_transform import LabelPowerset
from sklearn.ensemble import RandomForestClassifier
# initialize LabelPowerset multi-label classifier with a RandomForest
classifier = ClassifierChain(
 classifier = RandomForestClassifier(n_estimators=100),
 require_dense = [False, True]
)
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)

Another way to use this classifier is to select the best scenario from a set of multi-class classifiers used with Label Powerset, this can be done using cross validation grid search. In the example below, the model with highest accuracy results is selected from either a sklearn.ensemble.RandomForestClassifier or sklearn.naive_bayes.MultinomialNB base classifier, alongside with best parameters for that base classifier.

from skmultilearn.problem_transform import LabelPowerset
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
parameters = [
 {
 'classifier': [MultinomialNB()],
 'classifier__alpha': [0.7, 1.0],
 },
 {
 'classifier': [RandomForestClassifier()],
 'classifier__criterion': ['gini', 'entropy'],
 'classifier__n_estimators': [10, 20, 50],
 },
]
clf = GridSearchCV(LabelPowerset(), parameters, scoring='accuracy')
clf.fit(x, y)
print (clf.best_params_, clf.best_score_)
# result
# {
# 'classifier': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
# max_depth=None, max_features='auto', max_leaf_nodes=None,
# min_impurity_decrease=0.0, min_impurity_split=None,
# min_samples_leaf=1, min_samples_split=2,
# min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1,
# oob_score=False, random_state=None, verbose=0,
# warm_start=False), 'classifier__criterion': 'gini', 'classifier__n_estimators': 50
# } 0.16

fit(X, y)[source] ¶

Fits classifier to training data

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix y (array_like, `numpy.matrix` or `scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns:	fitted instance of self
Return type:	self

Notes

Note

Input matrices are converted to sparse format internally if a numpy representation is passed

inverse_transform(y)[source] ¶

Transforms multi-class assignment to multi-label

Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.

Parameters:	y (numpy.ndarray of {0, ... , n_classes-1}, shape=(n_samples,)) – binary indicator matrix with label assignments
Returns:	binary indicator matrix with label assignments
Return type:	`scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)

predict(X)[source] ¶

Predict labels for X

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix
Returns:	binary indicator matrix with label assignments
Return type:	`scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)

predict_proba(X)[source] ¶

Predict probabilities of label assignments for X

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix
Returns:	matrix with label assignment probabilities
Return type:	`scipy.sparse` matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

transform(y)[source] ¶

Transform multi-label output space to multi-class

Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.

Parameters:	y (array_like, `numpy.matrix` or `scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns:	a multi-class output space vector
Return type:	numpy.ndarray of {0, ... , n_classes-1}, shape=(n_samples,)

Cite US!

If you use scikit-multilearn in your research and publish it, please consider citing us, it will help us get funding for making the library better. The paper is available on arXiv, to cite it try the Bibtex code on the right.


 
 @ARTICLE{2017arXiv170201460S,
 author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
 title = "{A scikit-based Python environment for performing multi-label classification}",
 journal = {ArXiv e-prints},
 archivePrefix = "arXiv",
 eprint = {1702.01460},
 primaryClass = "cs.LG",
 keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
 year = 2017,
 month = feb,
 }

Created using Sphinx 1.8.2. Show this page source