Hierarchical clustering (scipy.cluster.hierarchy)

Warning

This documentation is work-in-progress and unorganized.

Function Reference

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

Function Description
fcluster forms flat clusters from hierarchical clusters.
fclusterdata forms flat clusters directly from data.
leaders singleton root nodes for flat cluster.

These are routines for agglomerative clustering.

Function Description
linkage agglomeratively clusters original observations.
single the single/min/nearest algorithm. (alias)
complete the complete/max/farthest algorithm. (alias)
average the average/UPGMA algorithm. (alias)
weighted the weighted/WPGMA algorithm. (alias)
centroid the centroid/UPGMC algorithm. (alias)
median the median/WPGMC algorithm. (alias)
ward the Ward/incremental algorithm. (alias)

These routines compute statistics on hierarchies.

Function Description
cophenet computes the cophenetic distance between leaves.
from_mlab_linkage converts a linkage produced by MATLAB(TM).
inconsistent the inconsistency coefficients for cluster.
maxinconsts the maximum inconsistency coefficient for each cluster.
maxdists the maximum distance for each cluster.
maxRstat the maximum specific statistic for each cluster.
to_mlab_linkage converts a linkage to one MATLAB(TM) can understand.

Routines for visualizing flat clusters.

Function Description
dendrogram visualizes linkages (requires matplotlib).

These are data structures and routines for representing hierarchies as tree objects.

Function Description
ClusterNode represents cluster nodes in a cluster hierarchy.
leaves_list a left-to-right traversal of the leaves.
to_tree represents a linkage matrix as a tree object.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

Function Description
is_valid_im checks for a valid inconsistency matrix.
is_valid_linkage checks for a valid hierarchical clustering.
is_isomorphic checks if two flat clusterings are isomorphic.
is_monotonic checks if a linkage is monotonic.
correspond checks whether a condensed distance matrix corresponds with a linkage
num_obs_linkage the number of observations corresponding to a linkage matrix.
  • MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
  • Mathematica is a registered trademark of The Wolfram Research, Inc.

References

[Sta07]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[Mti07]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html. Accessed October 1, 2007.
[Gow69]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[War63]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[Joh66]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[Sne62]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[Bat95]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[Sok58]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[Ede79]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[Jai88]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[Fis36]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936