This is documentation for an old release of SciPy (version 1.2.0). Read this page Search for this page in the documentation of the latest stable release (version 1.15.1).

scipy.cluster.hierarchy.fclusterdata¶

scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]¶

Cluster observation data using a given metric.

Clusters the original observations in the n-by-m data matrix X (n observations in m dimensions), using the euclidean distance metric to calculate distances between original observations, performs hierarchical clustering using the single linkage algorithm, and forms flat clusters using the inconsistency method with t as the cut-off threshold.

A one-dimensional array T of length n is returned. T[i] is the index of the flat cluster to which the original observation i belongs.

Parameters:

Parameters:	X : (N, M) ndarray N by M data matrix with N observations in M dimensions. t : scalar For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’, this is the threshold to apply when forming flat clusters. For ‘maxclust’ or ‘maxclust_monocrit’ criteria, this would be max number of clusters requested. criterion : str, optional Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See `fcluster` for descriptions. metric : str, optional The distance metric for calculating pairwise distances. See `distance.pdist` for descriptions and linkage to verify compatibility with the linkage method. depth : int, optional The maximum depth for the inconsistency calculation. See `inconsistent` for more information. method : str, optional The linkage method to use (single, complete, average, weighted, median centroid, ward). See `linkage` for more information. Default is “single”. R : ndarray, optional The inconsistency matrix. It will be computed if necessary if it is not passed.
Returns:	fclusterdata : ndarray A vector of length n. T[i] is the flat cluster number to which original observation i belongs.

X : (N, M) ndarray

N by M data matrix with N observations in M dimensions.

t : scalar

For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’,: this is the threshold to apply when forming flat clusters.
For ‘maxclust’ or ‘maxclust_monocrit’ criteria,: this would be max number of clusters requested.

criterion : str, optional

Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See fcluster for descriptions.

metric : str, optional

The distance metric for calculating pairwise distances. See distance.pdist for descriptions and linkage to verify compatibility with the linkage method.

depth : int, optional

The maximum depth for the inconsistency calculation. See inconsistent for more information.

method : str, optional

The linkage method to use (single, complete, average, weighted, median centroid, ward). See linkage for more information. Default is “single”.

R : ndarray, optional

The inconsistency matrix. It will be computed if necessary if it is not passed.

Returns:

fclusterdata : ndarray: A vector of length n. T[i] is the flat cluster number to which original observation i belongs.

See also

scipy.spatial.distance.pdist: pairwise distance metrics

Notes

This function is similar to the MATLAB function clusterdata.

Examples

>>>>>> from scipy.cluster.hierarchy import fclusterdata

This is a convenience method that abstracts all the steps to perform in a typical Scipy’s hierarchical clustering workflow.

Transform the input data into a condensed matrix with scipy.spatial.distance.pdist.
Apply a clustering method.
Obtain flat clusters at a user defined distance threshold t using scipy.cluster.hierarchy.fcluster.

>>>>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

>>>>>> fclusterdata(X, t=1)
array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)

The output here (for the dataset X, distance threshold t, and the default settings) is four clusters with three data points each.

Previous topic

scipy.cluster.hierarchy.fcluster

Next topic

scipy.cluster.hierarchy.leaders

scipy.cluster.hierarchy.fclusterdata¶

Previous topic

Next topic

Quick search