These functions cut hierarchical clusterings into flat clusterings
or find the roots of the forest formed by a cut by providing the flat
cluster ids of each observation.
fcluster(Z, t[, criterion, depth, R, monocrit]) |
Forms flat clusters from the hierarchical clustering defined by |
fclusterdata(X, t[, criterion, metric, ...]) |
Cluster observation data using a given metric. |
leaders(Z, T) |
(L, M) = leaders(Z, T): |
These are routines for agglomerative clustering.
linkage(y[, method, metric]) |
Performs hierarchical/agglomerative clustering on the condensed distance matrix y. |
single(y) |
Performs single/min/nearest linkage on the condensed distance matrix y |
complete(y) |
Performs complete/max/farthest point linkage on a condensed distance matrix |
average(y) |
Performs average/UPGMA linkage on a condensed distance matrix |
weighted(y) |
Performs weighted/WPGMA linkage on the condensed distance matrix |
centroid(y) |
Performs centroid/UPGMC linkage. See linkage for more |
median(y) |
Performs median/WPGMC linkage. See linkage for more |
ward(y) |
Performs Ward’s linkage on a condensed or redundant distance |
These routines compute statistics on hierarchies.
cophenet(Z[, Y]) |
Calculates the cophenetic distances between each observation in |
from_mlab_linkage(Z) |
Converts a linkage matrix generated by MATLAB(TM) to a new |
inconsistent(Z[, d]) |
Calculates inconsistency statistics on a linkage. |
maxinconsts(Z, R) |
Returns the maximum inconsistency coefficient for each non-singleton cluster and its descendents. |
maxdists(Z) |
Returns the maximum distance between any non-singleton cluster. |
maxRstat(Z, R, i) |
Returns the maximum statistic for each non-singleton cluster and its descendents. |
to_mlab_linkage(Z) |
Converts a linkage matrix Z generated by the linkage function |
Routines for visualizing flat clusters.
dendrogram(Z[, p, truncate_mode, ...]) |
Plots the hierarchical clustering as a dendrogram. |
These are data structures and routines for representing hierarchies as
tree objects.
ClusterNode(id[, left, right, dist, count]) |
A tree node class for representing a cluster. |
leaves_list(Z) |
Returns a list of leaf node ids (corresponding to observation vector index) as they appear in the tree from left to right. |
to_tree(Z[, rd]) |
Converts a hierarchical clustering encoded in the matrix Z (by |
These are predicates for checking the validity of linkage and
inconsistency matrices as well as for checking isomorphism of two
flat cluster assignments.
is_valid_im(R[, warning, throw, name]) |
Returns True if the inconsistency matrix passed is valid. |
is_valid_linkage(Z[, warning, throw, name]) |
Checks the validity of a linkage matrix. |
is_isomorphic(T1, T2) |
Determines if two different cluster assignments are equivalent. |
is_monotonic(Z) |
Returns True if the linkage passed is monotonic. The linkage |
correspond(Z, Y) |
Checks for correspondence between linkage and condensed distance matrices |
num_obs_linkage(Z) |
Returns the number of original observations of the linkage matrix passed. |
Utility routines for plotting:
set_link_color_palette(palette) |
Changes the list of matplotlib color codes to use when coloring links with the dendrogram color_threshold feature. |
References
[Gow69] | Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage
Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969. |
[War63] | Ward Jr, JH. “Hierarchical grouping to optimize an objective
function.” Journal of the American Statistical Association. 58(301):
pp. 236–44. 1963. |
[Joh66] | Johnson, SC. “Hierarchical clustering schemes.” Psychometrika.
32(2): pp. 241–54. 1966. |
[Sne62] | Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp.
855–60. 1962. |
[Bat95] | Batagelj, V. “Comparing resemblance measures.” Journal of
Classification. 12: pp. 73–90. 1995. |
[Sok58] | Sokal, RR and Michener, CD. “A statistical method for evaluating
systematic relationships.” Scientific Bulletins. 38(22):
pp. 1409–38. 1958. |
[Ede79] | Edelbrock, C. “Mixture model tests of hierarchical clustering
algorithms: the problem of classifying everybody.” Multivariate
Behavioral Research. 14: pp. 367–84. 1979. |
[Jai88] | Jain, A., and Dubes, R., “Algorithms for Clustering Data.”
Prentice-Hall. Englewood Cliffs, NJ. 1988. |
[Fis36] | Fisher, RA “The use of multiple measurements in taxonomic
problems.” Annals of Eugenics, 7(2): 179-188. 1936 |
- MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
- Mathematica is a registered trademark of The Wolfram Research, Inc.