scipy.spatial.distance.jaccard#
- scipy.spatial.distance.jaccard(u, v, w=None)[source]#
- Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays. - The Jaccard-Needham dissimilarity between 1-D boolean arrays u and v, is defined as \[\frac{c_{TF} + c_{FT}} {c_{TT} + c_{FT} + c_{TF}}\]- where \(c_{ij}\) is the number of occurrences of \(\mathtt{u[k]} = i\) and \(\mathtt{v[k]} = j\) for \(k < n\). - Parameters
- u(N,) array_like, bool
- Input array. 
- v(N,) array_like, bool
- Input array. 
- w(N,) array_like, optional
- The weights for each value in u and v. Default is None, which gives each value a weight of 1.0 
 
- Returns
- jaccarddouble
- The Jaccard distance between vectors u and v. 
 
 - Notes - When both u and v lead to a 0/0 division i.e. there is no overlap between the items in the vectors the returned distance is 0. See the Wikipedia page on the Jaccard index [1], and this paper [2]. - Changed in version 1.2.0: Previously, when u and v lead to a 0/0 division, the function would return NaN. This was changed to return 0 instead. - References - 1
- 2
- S. Kosub, “A note on the triangle inequality for the Jaccard distance”, 2016, arXiv:1612.02696 
 - Examples - >>> from scipy.spatial import distance >>> distance.jaccard([1, 0, 0], [0, 1, 0]) 1.0 >>> distance.jaccard([1, 0, 0], [1, 1, 0]) 0.5 >>> distance.jaccard([1, 0, 0], [1, 2, 0]) 0.5 >>> distance.jaccard([1, 0, 0], [1, 1, 1]) 0.66666666666666663