scipy.cluster.hierarchy.fclusterdata¶
-
scipy.cluster.hierarchy.
fclusterdata
(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]¶ Cluster observation data using a given metric.
Clusters the original observations in the n-by-m data matrix X (n observations in m dimensions), using the euclidean distance metric to calculate distances between original observations, performs hierarchical clustering using the single linkage algorithm, and forms flat clusters using the inconsistency method with t as the cut-off threshold.
A one-dimensional array
T
of lengthn
is returned.T[i]
is the index of the flat cluster to which the original observationi
belongs.Parameters: - X : (N, M) ndarray
N by M data matrix with N observations in M dimensions.
- t : scalar
- For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’,
this is the threshold to apply when forming flat clusters.
- For ‘maxclust’ or ‘maxclust_monocrit’ criteria,
this would be max number of clusters requested.
- criterion : str, optional
Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See
fcluster
for descriptions.- metric : str, optional
The distance metric for calculating pairwise distances. See
distance.pdist
for descriptions and linkage to verify compatibility with the linkage method.- depth : int, optional
The maximum depth for the inconsistency calculation. See
inconsistent
for more information.- method : str, optional
The linkage method to use (single, complete, average, weighted, median centroid, ward). See
linkage
for more information. Default is “single”.- R : ndarray, optional
The inconsistency matrix. It will be computed if necessary if it is not passed.
Returns: - fclusterdata : ndarray
A vector of length n. T[i] is the flat cluster number to which original observation i belongs.
See also
scipy.spatial.distance.pdist
- pairwise distance metrics
Notes
This function is similar to the MATLAB function
clusterdata
.Examples
>>> from scipy.cluster.hierarchy import fclusterdata
This is a convenience method that abstracts all the steps to perform in a typical Scipy’s hierarchical clustering workflow.
- Transform the input data into a condensed matrix with
scipy.spatial.distance.pdist
. - Apply a clustering method.
- Obtain flat clusters at a user defined distance threshold
t
usingscipy.cluster.hierarchy.fcluster
.
>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
>>> fclusterdata(X, t=1) array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)
The output here (for the dataset
X
, distance thresholdt
, and the default settings) is four clusters with three data points each.