This is documentation for an old release of SciPy (version 0.10.0). Read this page Search for this page in the documentation of the latest stable release (version 1.15.1).
These functions cut hierarchical clusterings into flat clusterings
or find the roots of the forest formed by a cut by providing the flat
cluster ids of each observation.
fcluster(Z, t[, criterion, depth, R, monocrit]) |
Forms flat clusters from the hierarchical clustering defined by |
fclusterdata(X, t[, criterion, metric, ...]) |
Cluster observation data using a given metric. |
leaders(Z, T) |
(L, M) = leaders(Z, T): |
These are routines for agglomerative clustering.
linkage(y[, method, metric]) |
Performs hierarchical/agglomerative clustering on the condensed distance matrix y. |
single(y) |
Performs single/min/nearest linkage on the condensed distance matrix y. |
complete(y) |
Performs complete complete/max/farthest point linkage on the condensed distance matrix y. |
average(y) |
Performs average/UPGMA linkage on the condensed distance matrix |
weighted(y) |
Performs weighted/WPGMA linkage on the condensed distance matrix |
centroid(y) |
Performs centroid/UPGMC linkage. See linkage for more |
median(y) |
Performs median/WPGMC linkage. See linkage for more |
ward(y) |
Performs Ward’s linkage on a condensed or redundant distance |
These routines compute statistics on hierarchies.
cophenet(Z[, Y]) |
Calculates the cophenetic distances between each observation in |
from_mlab_linkage(Z) |
Converts a linkage matrix generated by MATLAB(TM) to a new |
inconsistent(Z[, d]) |
Calculates inconsistency statistics on a linkage. |
maxinconsts(Z, R) |
Returns the maximum inconsistency coefficient for each non-singleton cluster and its descendents. |
maxdists(Z) |
Returns the maximum distance between any cluster for each non-singleton cluster. |
maxRstat(Z, R, i) |
Returns the maximum statistic for each non-singleton cluster and its descendents. |
to_mlab_linkage(Z) |
Converts a linkage matrix Z generated by the linkage function |
Routines for visualizing flat clusters.
dendrogram(Z[, p, truncate_mode, ...]) |
Plots the hierarchical clustering as a dendrogram. |
These are data structures and routines for representing hierarchies as
tree objects.
ClusterNode(id[, left, right, dist, count]) |
A tree node class for representing a cluster. |
leaves_list(Z) |
Returns a list of leaf node ids (corresponding to observation vector index) as they appear in the tree from left to right. |
to_tree(Z[, rd]) |
Converts a hierarchical clustering encoded in the matrix Z (by |
These are predicates for checking the validity of linkage and
inconsistency matrices as well as for checking isomorphism of two
flat cluster assignments.
is_valid_im(R[, warning, throw, name]) |
Returns True if the inconsistency matrix passed is valid. |
is_valid_linkage(Z[, warning, throw, name]) |
Checks the validity of a linkage matrix. |
is_isomorphic(T1, T2) |
Determines if two different cluster assignments T1 and |
is_monotonic(Z) |
Returns True if the linkage passed is monotonic. The linkage |
correspond(Z, Y) |
Checks if a linkage matrix Z and condensed distance matrix |
num_obs_linkage(Z) |
Returns the number of original observations of the linkage matrix passed. |
Utility routines for plotting:
set_link_color_palette(palette) |
Changes the list of matplotlib color codes to use when coloring links with the dendrogram color_threshold feature. |
References
[Gow69] | Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage
Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969. |
[War63] | Ward Jr, JH. “Hierarchical grouping to optimize an objective
function.” Journal of the American Statistical Association. 58(301):
pp. 236–44. 1963. |
[Joh66] | Johnson, SC. “Hierarchical clustering schemes.” Psychometrika.
32(2): pp. 241–54. 1966. |
[Sne62] | Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp.
855–60. 1962. |
[Bat95] | Batagelj, V. “Comparing resemblance measures.” Journal of
Classification. 12: pp. 73–90. 1995. |
[Sok58] | Sokal, RR and Michener, CD. “A statistical method for evaluating
systematic relationships.” Scientific Bulletins. 38(22):
pp. 1409–38. 1958. |
[Ede79] | Edelbrock, C. “Mixture model tests of hierarchical clustering
algorithms: the problem of classifying everybody.” Multivariate
Behavioral Research. 14: pp. 367–84. 1979. |
[Jai88] | Jain, A., and Dubes, R., “Algorithms for Clustering Data.”
Prentice-Hall. Englewood Cliffs, NJ. 1988. |
[Fis36] | Fisher, RA “The use of multiple measurements in taxonomic
problems.” Annals of Eugenics, 7(2): 179-188. 1936 |
- MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
- Mathematica is a registered trademark of The Wolfram Research, Inc.
Copyright Notice
Copyright (C) Damian Eads, 2007-2008. New BSD License.