This is documentation for an old release of SciPy (version 0.10.1). Read this page in the documentation of the latest stable release (version 1.15.1).

Distance computations (scipy.spatial.distance)

Function Reference

Distance matrix computation from a collection of raw observation vectors stored in a rectangular array.

pdist(X[, metric, p, w, V, VI]) Computes the pairwise distances between m original observations in n-dimensional space.
cdist(XA, XB[, metric, p, V, VI, w]) Computes distance between each pair of observation vectors in the
squareform(X[, force, checks]) Converts a vector-form distance vector to a square-form distance matrix, and vice-versa.

Predicates for checking the validity of distance matrices, both condensed and redundant. Also contained in this module are functions for computing the number of observations in a distance matrix.

is_valid_dm(D[, tol, throw, name, warning]) Returns True if the variable D passed is a valid distance matrix.
is_valid_y(y[, warning, throw, name]) Returns True if the variable y passed is a valid condensed
num_obs_dm(d) Returns the number of original observations that correspond to a
num_obs_y(Y) Returns the number of original observations that correspond to a

Distance functions between two vectors u and v. Computing distances over a large collection of vectors is inefficient for these functions. Use pdist for this purpose.

braycurtis(u, v) Computes the Bray-Curtis distance between two n-vectors u and
canberra(u, v) Computes the Canberra distance between two n-vectors u and v,
chebyshev(u, v) Computes the Chebyshev distance between two n-vectors u and v,
cityblock(u, v) Computes the Manhattan distance between two n-vectors u and v,
correlation(u, v) Computes the correlation distance between two n-vectors u and v, which is defined as ..
cosine(u, v) Computes the Cosine distance between two n-vectors u and v, which
dice(u, v) Computes the Dice dissimilarity between two boolean n-vectors
euclidean(u, v) Computes the Euclidean distance between two n-vectors u and v,
hamming(u, v) Computes the Hamming distance between two n-vectors u and
jaccard(u, v) Computes the Jaccard-Needham dissimilarity between two boolean
kulsinski(u, v) Computes the Kulsinski dissimilarity between two boolean n-vectors
mahalanobis(u, v, VI) Computes the Mahalanobis distance between two n-vectors u and v,
matching(u, v) Computes the Matching dissimilarity between two boolean n-vectors
minkowski(u, v, p) Computes the Minkowski distance between two vectors u and v,
rogerstanimoto(u, v) Computes the Rogers-Tanimoto dissimilarity between two boolean
russellrao(u, v) Computes the Russell-Rao dissimilarity between two boolean n-vectors
seuclidean(u, v, V) Returns the standardized Euclidean distance between two n-vectors
sokalmichener(u, v) Computes the Sokal-Michener dissimilarity between two boolean vectors
sokalsneath(u, v) Computes the Sokal-Sneath dissimilarity between two boolean vectors
sqeuclidean(u, v) Computes the squared Euclidean distance between two n-vectors u and v,
yule(u, v) Computes the Yule dissimilarity between two boolean n-vectors u and v,

References

[Sta07]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[Mti07]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html. Accessed October 1, 2007.
[Gow69]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[War63]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[Joh66]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[Sne62]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[Bat95]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[Sok58]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[Ede79]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[Jai88]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[Fis36]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936

Table Of Contents

Previous topic

scipy.spatial.tsearch

Next topic

scipy.spatial.distance.pdist

This Page