SciPy

Hierarchical clustering (scipy.cluster.hierarchy)

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

fcluster(Z, t[, criterion, depth, R, monocrit]) Forms flat clusters from the hierarchical clustering defined by the linkage matrix Z.
fclusterdata(X, t[, criterion, metric, ...]) Cluster observation data using a given metric.
leaders(Z, T) Returns the root nodes in a hierarchical clustering.

These are routines for agglomerative clustering.

linkage(y[, method, metric]) Performs hierarchical/agglomerative clustering on the condensed distance matrix y.
single(y) Performs single/min/nearest linkage on the condensed distance matrix y
complete(y) Performs complete/max/farthest point linkage on a condensed distance matrix
average(y) Performs average/UPGMA linkage on a condensed distance matrix
weighted(y) Performs weighted/WPGMA linkage on the condensed distance matrix.
centroid(y) Performs centroid/UPGMC linkage.
median(y) Performs median/WPGMC linkage.
ward(y) Performs Ward’s linkage on a condensed or redundant distance matrix.

These routines compute statistics on hierarchies.

cophenet(Z[, Y]) Calculates the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.
from_mlab_linkage(Z) Converts a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module.
inconsistent(Z[, d]) Calculates inconsistency statistics on a linkage.
maxinconsts(Z, R) Returns the maximum inconsistency coefficient for each non-singleton cluster and its descendents.
maxdists(Z) Returns the maximum distance between any non-singleton cluster.
maxRstat(Z, R, i) Returns the maximum statistic for each non-singleton cluster and its descendents.
to_mlab_linkage(Z) Converts a linkage matrix to a MATLAB(TM) compatible one.

Routines for visualizing flat clusters.

dendrogram(Z[, p, truncate_mode, ...]) Plots the hierarchical clustering as a dendrogram.

These are data structures and routines for representing hierarchies as tree objects.

ClusterNode(id[, left, right, dist, count]) A tree node class for representing a cluster.
leaves_list(Z) Returns a list of leaf node ids
to_tree(Z[, rd]) Converts a hierarchical clustering encoded in the matrix Z (by linkage) into an easy-to-use tree object.
cut_tree(Z[, n_clusters, height]) Given a linkage matrix Z, return the cut tree.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

is_valid_im(R[, warning, throw, name]) Returns True if the inconsistency matrix passed is valid.
is_valid_linkage(Z[, warning, throw, name]) Checks the validity of a linkage matrix.
is_isomorphic(T1, T2) Determines if two different cluster assignments are equivalent.
is_monotonic(Z) Returns True if the linkage passed is monotonic.
correspond(Z, Y) Checks for correspondence between linkage and condensed distance matrices
num_obs_linkage(Z) Returns the number of original observations of the linkage matrix passed.

Utility routines for plotting:

set_link_color_palette(palette) Set list of matplotlib color codes for use by dendrogram.

References

[R1]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[R2]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/ HierarchicalClustering.html. Accessed October 1, 2007.
[R3]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[R4]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[R5]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[R6]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[R7]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[R8]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[R9]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[10]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[11]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936
  • MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
  • Mathematica is a registered trademark of The Wolfram Research, Inc.