8.1.2.7. sklearn.cluster.spectral_clustering

sklearn.cluster.spectral_clustering(affinity, k=8, n_components=None, mode=None, random_state=None, n_init=10)

Apply k-means to a projection to the normalized laplacian

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

Parameters :

affinity: array-like or sparse matrix, shape: (n_samples, n_samples) :

The affinity matrix describing the relationship of the samples to embed. Must be symetric.

Possible examples:
  • adjacency matrix of a graph,
  • heat kernel of the pairwise distance matrix of the samples,
  • symmetic k-nearest neighbours connectivity matrix of the samples.

k: integer, optional :

Number of clusters to extract.

n_components: integer, optional, default is k :

Number of eigen vectors to use for the spectral embedding

mode: {None, ‘arpack’ or ‘amg’} :

The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities

random_state: int seed, RandomState instance, or None (default) :

A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when mode == ‘amg’ and by the K-Means initialization.

n_init: int, optional, default: 10 :

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

Returns :

labels: array of integers, shape: n_samples :

The labels of the clusters.

centers: array of integers, shape: k :

The indices of the cluster centers

Notes

The graph should contain only one connect component, elsewhere the results make little sense.

This algorithm solves the normalized cut for k=2: it is a normalized spectral clustering.

References

Previous
Next