scipy.spatial.distance.jaccard¶
-
scipy.spatial.distance.
jaccard
(u, v, w=None)[source]¶ Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays.
The Jaccard-Needham dissimilarity between 1-D boolean arrays u and v, is defined as
\[\frac{c_{TF} + c_{FT}} {c_{TT} + c_{FT} + c_{TF}}\]where \(c_{ij}\) is the number of occurrences of \(\mathtt{u[k]} = i\) and \(\mathtt{v[k]} = j\) for \(k < n\).
Parameters: - u : (N,) array_like, bool
Input array.
- v : (N,) array_like, bool
Input array.
- w : (N,) array_like, optional
The weights for each value in u and v. Default is None, which gives each value a weight of 1.0
Returns: - jaccard : double
The Jaccard distance between vectors u and v.
Notes
When both u and v lead to a 0/0 division i.e. there is no overlap between the items in the vectors the returned distance is 0. See the Wikipedia page on the Jaccard index [1], and this paper [2].
Changed in version 1.2.0: Previously, when u and v lead to a 0/0 division, the function would return NaN. This was changed to return 0 instead.
References
[1] (1, 2) https://en.wikipedia.org/wiki/Jaccard_index [2] (1, 2) S. Kosub, “A note on the triangle inequality for the Jaccard distance”, 2016, Available online: https://arxiv.org/pdf/1612.02696.pdf Examples
>>> from scipy.spatial import distance >>> distance.jaccard([1, 0, 0], [0, 1, 0]) 1.0 >>> distance.jaccard([1, 0, 0], [1, 1, 0]) 0.5 >>> distance.jaccard([1, 0, 0], [1, 2, 0]) 0.5 >>> distance.jaccard([1, 0, 0], [1, 1, 1]) 0.66666666666666663