SciPy

scipy.spatial.distance.jaccard

scipy.spatial.distance.jaccard(u, v, w=None)[source]

Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays.

The Jaccard-Needham dissimilarity between 1-D boolean arrays u and v, is defined as

\[\frac{c_{TF} + c_{FT}} {c_{TT} + c_{FT} + c_{TF}}\]

where \(c_{ij}\) is the number of occurrences of \(\mathtt{u[k]} = i\) and \(\mathtt{v[k]} = j\) for \(k < n\).

Parameters:
u : (N,) array_like, bool

Input array.

v : (N,) array_like, bool

Input array.

w : (N,) array_like, optional

The weights for each value in u and v. Default is None, which gives each value a weight of 1.0

Returns:
jaccard : double

The Jaccard distance between vectors u and v.

Notes

When both u and v lead to a 0/0 division i.e. there is no overlap between the items in the vectors the returned distance is 0. See the Wikipedia page on the Jaccard index [1], and this paper [2].

Changed in version 1.2.0: Previously, when u and v lead to a 0/0 division, the function would return NaN. This was changed to return 0 instead.

References

[1](1, 2) https://en.wikipedia.org/wiki/Jaccard_index
[2](1, 2) S. Kosub, “A note on the triangle inequality for the Jaccard distance”, 2016, Available online: https://arxiv.org/pdf/1612.02696.pdf

Examples

>>> from scipy.spatial import distance
>>> distance.jaccard([1, 0, 0], [0, 1, 0])
1.0
>>> distance.jaccard([1, 0, 0], [1, 1, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 2, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 1, 1])
0.66666666666666663

Previous topic

scipy.spatial.distance.hamming

Next topic

scipy.spatial.distance.kulsinski