SciPy

Hierarchical clustering (scipy.cluster.hierarchy)

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

fcluster(Z, t[, criterion, depth, R, monocrit])

Form flat clusters from the hierarchical clustering defined by the given linkage matrix.

fclusterdata(X, t[, criterion, metric, …])

Cluster observation data using a given metric.

leaders(Z, T)

Return the root nodes in a hierarchical clustering.

These are routines for agglomerative clustering.

linkage(y[, method, metric, optimal_ordering])

Perform hierarchical/agglomerative clustering.

single(y)

Perform single/min/nearest linkage on the condensed distance matrix y.

complete(y)

Perform complete/max/farthest point linkage on a condensed distance matrix.

average(y)

Perform average/UPGMA linkage on a condensed distance matrix.

weighted(y)

Perform weighted/WPGMA linkage on the condensed distance matrix.

centroid(y)

Perform centroid/UPGMC linkage.

median(y)

Perform median/WPGMC linkage.

ward(y)

Perform Ward’s linkage on a condensed distance matrix.

These routines compute statistics on hierarchies.

cophenet(Z[, Y])

Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.

from_mlab_linkage(Z)

Convert a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module.

inconsistent(Z[, d])

Calculate inconsistency statistics on a linkage matrix.

maxinconsts(Z, R)

Return the maximum inconsistency coefficient for each non-singleton cluster and its children.

maxdists(Z)

Return the maximum distance between any non-singleton cluster.

maxRstat(Z, R, i)

Return the maximum statistic for each non-singleton cluster and its children.

to_mlab_linkage(Z)

Convert a linkage matrix to a MATLAB(TM) compatible one.

Routines for visualizing flat clusters.

dendrogram(Z[, p, truncate_mode, …])

Plot the hierarchical clustering as a dendrogram.

These are data structures and routines for representing hierarchies as tree objects.

ClusterNode(id[, left, right, dist, count])

A tree node class for representing a cluster.

leaves_list(Z)

Return a list of leaf node ids.

to_tree(Z[, rd])

Convert a linkage matrix into an easy-to-use tree object.

cut_tree(Z[, n_clusters, height])

Given a linkage matrix Z, return the cut tree.

optimal_leaf_ordering(Z, y[, metric])

Given a linkage matrix Z and distance, reorder the cut tree.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

is_valid_im(R[, warning, throw, name])

Return True if the inconsistency matrix passed is valid.

is_valid_linkage(Z[, warning, throw, name])

Check the validity of a linkage matrix.

is_isomorphic(T1, T2)

Determine if two different cluster assignments are equivalent.

is_monotonic(Z)

Return True if the linkage passed is monotonic.

correspond(Z, Y)

Check for correspondence between linkage and condensed distance matrices.

num_obs_linkage(Z)

Return the number of original observations of the linkage matrix passed.

Utility routines for plotting:

set_link_color_palette(palette)

Set list of matplotlib color codes for use by dendrogram.

Utility classes:

DisjointSet([elements])

Disjoint set data structure for incremental connectivity queries.

Previous topic

scipy.cluster.vq.kmeans2

Next topic

scipy.cluster.hierarchy.fcluster