skmultilearn.cluster.base module

skmultilearn.cluster.base module¶

class skmultilearn.cluster.base.GraphBuilderBase[source] ¶

Bases: object

An abstract base class for a graph building class used in Label Space clustering.

Inherit it in your classifier according to`developer guide <../developer.ipynb>`_.

transform(y)[source] ¶

Abstract method for graph edge map builder for a label space clusterer

Implement it in your classifier according to`developer guide <../developer.ipynb>`_.

Raises:	`NotImplementedError` – this is an abstract method

class skmultilearn.cluster.base.LabelCooccurrenceGraphBuilder(weighted=None, include_self_edges=None, normalize_self_edges=None)[source] ¶

Bases: skmultilearn.cluster.base.GraphBuilderBase

Base class providing API and common functions for all label co-occurence based multi-label classifiers.

This graph builder constructs a Label Graph based on the output matrix where two label nodes are connected when at least one sample is labeled with both of them. If the graph is weighted, the weight of an edge between two label nodes is the number of samples labeled with these two labels. Self-edge weights contain the number of samples with a given label.

Parameters:	weighted (bool) – decide whether to generate a weighted or unweighted graph. include_self_edges (bool) – decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph normalize_self_edges (bool) – if including self edges, divide the (i, i) edge by 2.0, requires include_self_edges=True

References

If you use this graph builder please cite the clustering paper:

@Article{datadriven,
 author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
 title = {How Is a Data-Driven Approach Better than Random Choice in
 Label Space Division for Multi-Label Classification?},
 journal = {Entropy},
 volume = {18},
 year = {2016},
 number = {8},
 article_number = {282},
 url = {http://www.mdpi.com/1099-4300/18/8/282},
 issn = {1099-4300},
 doi = {10.3390/e18080282}
}

Examples

A full example of building a modularity-based label space division based on the Label Co-occurrence Graph and classifying with a separate classifier chain per subspace.

from skmultilearn.cluster import LabelCooccurrenceGraphBuilder, NetworkXLabelGraphClusterer
from skmultilearn.ensemble import LabelSpacePartitioningClassifier
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB
graph_builder = LabelCooccurrenceGraphBuilder(weighted=True, include_self_edges=False, normalize_self_edges=False)
clusterer = NetworkXLabelGraphClusterer(graph_builder, method='louvain')
classifier = LabelSpacePartitioningClassifier(
 classifier = ClassifierChain(classifier=GaussianNB()),
 clusterer = clusterer
)
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)

For more use cases see the label relations exploration guide.

transform(y)[source] ¶

Generate adjacency matrix from label matrix

This function generates a weighted or unweighted co-occurence Label Graph adjacency matrix in dictionary of keys format based on input binary label vectors

Parameters:	y (numpy.ndarray or scipy.sparse) – dense or sparse binary matrix with shape `(n_samples, n_labels)`
Returns:	weight map with a tuple of label indexes as keys and a the number of samples in which the two co-occurred
Return type:	Dict[(int, int), float]

class skmultilearn.cluster.base.LabelGraphClustererBase(graph_builder)[source] ¶

Bases: object

An abstract base class for Label Graph clustering

Inherit it in your classifier according to`developer guide <../developer.ipynb>`_.

fit_predict(X, y)[source] ¶

Abstract method for clustering label space

Implement it in your classifier according to`developer guide <../developer.ipynb>`_.

Raises:	`NotImplementedError` – this is an abstract method

class skmultilearn.cluster.base.LabelSpaceClustererBase[source] ¶

Bases: sklearn.base.BaseEstimator

An abstract base class for Label Space clustering

Inherit it in your classifier according to`developer guide <../developer.ipynb>`_.

fit_predict(X, y)[source] ¶

Abstract method for clustering label space

Implement it in your classifier according to`developer guide <../developer.ipynb>`_.

Raises:	`NotImplementedError` – this is an abstract method

Cite US!

If you use scikit-multilearn in your research and publish it, please consider citing us, it will help us get funding for making the library better. The paper is available on arXiv, to cite it try the Bibtex code on the right.


 
 @ARTICLE{2017arXiv170201460S,
 author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
 title = "{A scikit-based Python environment for performing multi-label classification}",
 journal = {ArXiv e-prints},
 archivePrefix = "arXiv",
 eprint = {1702.01460},
 primaryClass = "cs.LG",
 keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
 year = 2017,
 month = feb,
 }

Created using Sphinx 1.8.2. Show this page source