fclusterdata#
- scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]#
- Cluster observation data using a given metric. - Clusters the original observations in the n-by-m data matrix X (n observations in m dimensions), using the euclidean distance metric to calculate distances between original observations, performs hierarchical clustering using the single linkage algorithm, and forms flat clusters using the inconsistency method with t as the cut-off threshold. - A 1-D array - Tof length- nis returned.- T[i]is the index of the flat cluster to which the original observation- ibelongs.- Parameters:
- X(N, M) ndarray
- N by M data matrix with N observations in M dimensions. 
- tscalar
- For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’,
- this is the threshold to apply when forming flat clusters. 
- For ‘maxclust’ or ‘maxclust_monocrit’ criteria,
- this would be max number of clusters requested. 
 
- criterionstr, optional
- Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See - fclusterfor descriptions.
- metricstr or function, optional
- The distance metric for calculating pairwise distances. See - distance.pdistfor descriptions and linkage to verify compatibility with the linkage method.
- depthint, optional
- The maximum depth for the inconsistency calculation. See - inconsistentfor more information.
- methodstr, optional
- The linkage method to use (single, complete, average, weighted, median centroid, ward). See - linkagefor more information. Default is “single”.
- Rndarray, optional
- The inconsistency matrix. It will be computed if necessary if it is not passed. 
 
- Returns:
- fclusterdatandarray
- A vector of length n. T[i] is the flat cluster number to which original observation i belongs. 
 
 - See also - scipy.spatial.distance.pdist
- pairwise distance metrics 
 - Notes - This function is similar to the MATLAB function - clusterdata.- fclusterdatahas experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable- SCIPY_ARRAY_API=1and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.- Library - CPU - GPU - NumPy - ✅ - n/a - CuPy - n/a - ⛔ - PyTorch - ✅ - ⛔ - JAX - ⚠️ no JIT - ⛔ - Dask - ⚠️ computes graph - n/a - See Support for the array API standard for more information. - Examples - >>> from scipy.cluster.hierarchy import fclusterdata - This is a convenience method that abstracts all the steps to perform in a typical SciPy’s hierarchical clustering workflow. - Transform the input data into a condensed matrix with - scipy.spatial.distance.pdist.
- Apply a clustering method. 
- Obtain flat clusters at a user defined distance threshold - tusing- scipy.cluster.hierarchy.fcluster.
 - >>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]] - >>> fclusterdata(X, t=1) array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32) - The output here (for the dataset - X, distance threshold- t, and the default settings) is four clusters with three data points each.