Distance computations (scipy.spatial.distance)

Function Reference

Distance matrix computation from a collection of raw observation vectors stored in a rectangular array.

Function Description
pdist pairwise distances between observation vectors.
cdist distances between between two collections of observation vectors.
squareform converts a square distance matrix to a condensed one and vice versa.

Predicates for checking the validity of distance matrices, both condensed and redundant. Also contained in this module are functions for computing the number of observations in a distance matrix.

Function Description
is_valid_dm checks for a valid distance matrix.
is_valid_y checks for a valid condensed distance matrix.
num_obs_dm # of observations in a distance matrix.
num_obs_y # of observations in a condensed distance matrix.

Distance functions between two vectors u and v. Computing distances over a large collection of vectors is inefficient for these functions. Use pdist for this purpose.

Function Description
braycurtis the Bray-Curtis distance.
canberra the Canberra distance.
chebyshev the Chebyshev distance.
cityblock the Manhattan distance.
correlation the Correlation distance.
cosine the Cosine distance.
dice the Dice dissimilarity (boolean).
euclidean the Euclidean distance.
hamming the Hamming distance (boolean).
jaccard the Jaccard distance (boolean).
kulsinski the Kulsinski distance (boolean).
mahalanobis the Mahalanobis distance.
matching the matching dissimilarity (boolean).
minkowski the Minkowski distance.
rogerstanimoto the Rogers-Tanimoto dissimilarity (boolean).
russellrao the Russell-Rao dissimilarity (boolean).
seuclidean the normalized Euclidean distance.
sokalmichener the Sokal-Michener dissimilarity (boolean).
sokalsneath the Sokal-Sneath dissimilarity (boolean).
sqeuclidean the squared Euclidean distance.
yule the Yule dissimilarity (boolean).

References

[Sta07]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[Mti07]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html. Accessed October 1, 2007.
[Gow69]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[War63]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[Joh66]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[Sne62]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[Bat95]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[Sok58]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[Ede79]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[Jai88]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[Fis36]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936