SciPy

Hierarchical clustering (scipy.cluster.hierarchy)

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

fcluster(Z, t[, criterion, depth, R, monocrit]) Form flat clusters from the hierarchical clustering defined by the given linkage matrix.
fclusterdata(X, t[, criterion, metric, ...]) Cluster observation data using a given metric.
leaders(Z, T) Return the root nodes in a hierarchical clustering.

These are routines for agglomerative clustering.

linkage(y[, method, metric, optimal_ordering]) Perform hierarchical/agglomerative clustering.
single(y) Perform single/min/nearest linkage on the condensed distance matrix y.
complete(y) Perform complete/max/farthest point linkage on a condensed distance matrix.
average(y) Perform average/UPGMA linkage on a condensed distance matrix.
weighted(y) Perform weighted/WPGMA linkage on the condensed distance matrix.
centroid(y) Perform centroid/UPGMC linkage.
median(y) Perform median/WPGMC linkage.
ward(y) Perform Ward’s linkage on a condensed distance matrix.

These routines compute statistics on hierarchies.

cophenet(Z[, Y]) Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.
from_mlab_linkage(Z) Convert a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module.
inconsistent(Z[, d]) Calculate inconsistency statistics on a linkage matrix.
maxinconsts(Z, R) Return the maximum inconsistency coefficient for each non-singleton cluster and its descendents.
maxdists(Z) Return the maximum distance between any non-singleton cluster.
maxRstat(Z, R, i) Return the maximum statistic for each non-singleton cluster and its descendents.
to_mlab_linkage(Z) Convert a linkage matrix to a MATLAB(TM) compatible one.

Routines for visualizing flat clusters.

dendrogram(Z[, p, truncate_mode, ...]) Plot the hierarchical clustering as a dendrogram.

These are data structures and routines for representing hierarchies as tree objects.

ClusterNode(id[, left, right, dist, count]) A tree node class for representing a cluster.
leaves_list(Z) Return a list of leaf node ids.
to_tree(Z[, rd]) Convert a linkage matrix into an easy-to-use tree object.
cut_tree(Z[, n_clusters, height]) Given a linkage matrix Z, return the cut tree.
optimal_leaf_ordering(Z, y[, metric]) Given a linkage matrix Z and distance, reorder the cut tree.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

is_valid_im(R[, warning, throw, name]) Return True if the inconsistency matrix passed is valid.
is_valid_linkage(Z[, warning, throw, name]) Check the validity of a linkage matrix.
is_isomorphic(T1, T2) Determine if two different cluster assignments are equivalent.
is_monotonic(Z) Return True if the linkage passed is monotonic.
correspond(Z, Y) Check for correspondence between linkage and condensed distance matrices.
num_obs_linkage(Z) Return the number of original observations of the linkage matrix passed.

Utility routines for plotting:

set_link_color_palette(palette) Set list of matplotlib color codes for use by dendrogram.

References

[R1]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[R2]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. https://reference.wolfram.com/language/HierarchicalClustering/tutorial/HierarchicalClustering.html. Accessed October 1, 2007.
[R3]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[R4]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[R5]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[R6]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[R7]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[R8]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[R9]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[10]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[11]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936
  • MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
  • Mathematica is a registered trademark of The Wolfram Research, Inc.