# Hierarchical clustering (scipy.cluster.hierarchy)¶

Warning

This documentation is work-in-progress and unorganized.

## Function Reference¶

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

 fcluster(Z, t[, criterion, depth, R, monocrit]) Forms flat clusters from the hierarchical clustering defined by fclusterdata(X, t[, criterion, metric, ...]) Cluster observation data using a given metric. leaders(Z, T) (L, M) = leaders(Z, T):

These are routines for agglomerative clustering.

 linkage(y[, method, metric]) Performs hierarchical/agglomerative clustering on the condensed distance matrix y. single(y) Performs single/min/nearest linkage on the condensed distance matrix y. complete(y) Performs complete complete/max/farthest point linkage on the condensed distance matrix y. average(y) Performs average/UPGMA linkage on the condensed distance matrix weighted(y) Performs weighted/WPGMA linkage on the condensed distance matrix centroid(y) Performs centroid/UPGMC linkage. See linkage for more median(y) Performs median/WPGMC linkage. See linkage for more ward(y) Performs Ward’s linkage on a condensed or redundant distance

These routines compute statistics on hierarchies.

 cophenet(Z[, Y]) Calculates the cophenetic distances between each observation in from_mlab_linkage(Z) Converts a linkage matrix generated by MATLAB(TM) to a new inconsistent(Z[, d]) Calculates inconsistency statistics on a linkage. maxinconsts(Z, R) Returns the maximum inconsistency coefficient for each non-singleton cluster and its descendents. maxdists(Z) Returns the maximum distance between any cluster for each non-singleton cluster. maxRstat(Z, R, i) Returns the maximum statistic for each non-singleton cluster and its descendents. to_mlab_linkage(Z) Converts a linkage matrix Z generated by the linkage function

Routines for visualizing flat clusters.

 dendrogram(Z[, p, truncate_mode, ...]) Plots the hiearchical clustering defined by the linkage Z as a

These are data structures and routines for representing hierarchies as tree objects.

 ClusterNode(id[, left, right, dist, count]) A tree node class for representing a cluster. leaves_list(Z) Returns a list of leaf node ids (corresponding to observation vector index) as they appear in the tree from left to right. to_tree(Z[, rd]) Converts a hierarchical clustering encoded in the matrix Z (by

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

 is_valid_im(R[, warning, throw, name]) Returns True if the inconsistency matrix passed is valid. It must is_valid_linkage(Z[, warning, throw, name]) Checks the validity of a linkage matrix. is_isomorphic(T1, T2) Determines if two different cluster assignments T1 and is_monotonic(Z) Returns True if the linkage passed is monotonic. The linkage correspond(Z, Y) Checks if a linkage matrix Z and condensed distance matrix num_obs_linkage(Z) Returns the number of original observations of the linkage matrix passed.
• MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
• Mathematica is a registered trademark of The Wolfram Research, Inc.

## References¶

 [Sta07] “Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
 [Mti07] “Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html. Accessed October 1, 2007.
 [Gow69] Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
 [War63] Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
 [Joh66] Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
 [Sne62] Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
 [Bat95] Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
 [Sok58] Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
 [Ede79] Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
 [Jai88] Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
 [Fis36] Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936