scipy.stats.energy_distance¶

scipy.stats.
energy_distance
(u_values, v_values, u_weights=None, v_weights=None)[source]¶ Compute the energy distance between two 1D distributions.
New in version 1.0.0.
Parameters:  u_values, v_values : array_like
Values observed in the (empirical) distribution.
 u_weights, v_weights : array_like, optional
Weight for each value. If unspecified, each value is assigned the same weight. u_weights (resp. v_weights) must have the same length as u_values (resp. v_values). If the weight sum differs from 1, it must still be positive and finite so that the weights can be normalized to sum to 1.
Returns:  distance : float
The computed distance between the distributions.
Notes
The energy distance between two distributions \(u\) and \(v\), whose respective CDFs are \(U\) and \(V\), equals to:
\[D(u, v) = \left( 2\mathbb EX  Y  \mathbb EX  X'  \mathbb EY  Y' \right)^{1/2}\]where \(X\) and \(X'\) (resp. \(Y\) and \(Y'\)) are independent random variables whose probability distribution is \(u\) (resp. \(v\)).
As shown in [2], for onedimensional realvalued variables, the energy distance is linked to the nondistributionfree version of the Cramervon Mises distance:
\[D(u, v) = \sqrt{2} l_2(u, v) = \left( 2 \int_{\infty}^{+\infty} (UV)^2 \right)^{1/2}\]Note that the common Cramervon Mises criterion uses the distributionfree version of the distance. See [2] (section 2), for more details about both versions of the distance.
The input distributions can be empirical, therefore coming from samples whose values are effectively inputs of the function, or they can be seen as generalized functions, in which case they are weighted sums of Dirac delta functions located at the specified values.
References
[1] “Energy distance”, https://en.wikipedia.org/wiki/Energy_distance [2] (1, 2, 3) Szekely “Estatistics: The energy of statistical samples.” Bowling Green State University, Department of Mathematics and Statistics, Technical Report 0216 (2002). [3] Rizzo, Szekely “Energy distance.” Wiley Interdisciplinary Reviews: Computational Statistics, 8(1):2738 (2015). [4] Bellemare, Danihelka, Dabney, Mohamed, Lakshminarayanan, Hoyer, Munos “The Cramer Distance as a Solution to Biased Wasserstein Gradients” (2017). arXiv:1705.10743. Examples
>>> from scipy.stats import energy_distance >>> energy_distance([0], [2]) 2.0000000000000004 >>> energy_distance([0, 8], [0, 8], [3, 1], [2, 2]) 1.0000000000000002 >>> energy_distance([0.7, 7.4, 2.4, 6.8], [1.4, 8. ], ... [2.1, 4.2, 7.4, 8. ], [7.6, 8.8]) 0.88003340976158217