Hierarchical clustering (scipy.cluster.hierarchy)#

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

fcluster(Z, t[, criterion, depth, R, monocrit])

Form flat clusters from the hierarchical clustering defined by the given linkage matrix.

fclusterdata(X, t[, criterion, metric, ...])

Cluster observation data using a given metric.

leaders(Z, T)

Return the root nodes in a hierarchical clustering.

These are routines for agglomerative clustering.

linkage(y[, method, metric, optimal_ordering])

Perform hierarchical/agglomerative clustering.


Perform single/min/nearest linkage on the condensed distance matrix y.


Perform complete/max/farthest point linkage on a condensed distance matrix.


Perform average/UPGMA linkage on a condensed distance matrix.


Perform weighted/WPGMA linkage on the condensed distance matrix.


Perform centroid/UPGMC linkage.


Perform median/WPGMC linkage.


Perform Ward's linkage on a condensed distance matrix.

These routines compute statistics on hierarchies.

cophenet(Z[, Y])

Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.


Convert a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module.

inconsistent(Z[, d])

Calculate inconsistency statistics on a linkage matrix.

maxinconsts(Z, R)

Return the maximum inconsistency coefficient for each non-singleton cluster and its children.


Return the maximum distance between any non-singleton cluster.

maxRstat(Z, R, i)

Return the maximum statistic for each non-singleton cluster and its children.


Convert a linkage matrix to a MATLAB(TM) compatible one.

Routines for visualizing flat clusters.

dendrogram(Z[, p, truncate_mode, ...])

Plot the hierarchical clustering as a dendrogram.

These are data structures and routines for representing hierarchies as tree objects.

ClusterNode(id[, left, right, dist, count])

A tree node class for representing a cluster.


Return a list of leaf node ids.

to_tree(Z[, rd])

Convert a linkage matrix into an easy-to-use tree object.

cut_tree(Z[, n_clusters, height])

Given a linkage matrix Z, return the cut tree.

optimal_leaf_ordering(Z, y[, metric])

Given a linkage matrix Z and distance, reorder the cut tree.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

is_valid_im(R[, warning, throw, name])

Return True if the inconsistency matrix passed is valid.

is_valid_linkage(Z[, warning, throw, name])

Check the validity of a linkage matrix.

is_isomorphic(T1, T2)

Determine if two different cluster assignments are equivalent.


Return True if the linkage passed is monotonic.

correspond(Z, Y)

Check for correspondence between linkage and condensed distance matrices.


Return the number of original observations of the linkage matrix passed.

Utility routines for plotting:


Set list of matplotlib color codes for use by dendrogram.

Utility classes:


Disjoint set data structure for incremental connectivity queries.