scipy.stats.

energy_distance#

scipy.stats.energy_distance(u_values, v_values, u_weights=None, v_weights=None)[source]#

Compute the energy distance between two 1D distributions.

Added in version 1.0.0.

Parameters:

u_values, v_valuesarray_like: Values observed in the (empirical) distribution.
u_weights, v_weightsarray_like, optional: Weight for each value. If unspecified, each value is assigned the same weight. u_weights (resp. v_weights) must have the same length as u_values (resp. v_values). If the weight sum differs from 1, it must still be positive and finite so that the weights can be normalized to sum to 1.

Returns:

distancefloat: The computed distance between the distributions.

Notes

The energy distance between two distributions \(u\) and \(v\), whose respective CDFs are \(U\) and \(V\), equals to:

\[D(u, v) = \left( 2\mathbb E|X - Y| - \mathbb E|X - X'| - \mathbb E|Y - Y'| \right)^{1/2}\]

where \(X\) and \(X'\) (resp. \(Y\) and \(Y'\)) are independent random variables whose probability distribution is \(u\) (resp. \(v\)).

Sometimes the square of this quantity is referred to as the “energy distance” (e.g. in [2], [4]), but as noted in [1] and [3], only the definition above satisfies the axioms of a distance function (metric).

As shown in [2], for one-dimensional real-valued variables, the energy distance is linked to the non-distribution-free version of the Cramér-von Mises distance:

\[D(u, v) = \sqrt{2} l_2(u, v) = \left( 2 \int_{-\infty}^{+\infty} (U-V)^2 \right)^{1/2}\]

Note that the common Cramér-von Mises criterion uses the distribution-free version of the distance. See [2] (section 2), for more details about both versions of the distance.

The input distributions can be empirical, therefore coming from samples whose values are effectively inputs of the function, or they can be seen as generalized functions, in which case they are weighted sums of Dirac delta functions located at the specified values.

Array API Standard Support

energy_distance has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

Library	CPU	GPU
NumPy	✅	n/a
CuPy	n/a	⛔
PyTorch	⛔	⛔
JAX	⛔	⛔
Dask	⛔	n/a

See Support for the array API standard for more information.

References

[1]

Rizzo, Szekely “Energy distance.” Wiley Interdisciplinary Reviews: Computational Statistics, 8(1):27-38 (2015).

[2] (1,2,3)

Szekely “E-statistics: The energy of statistical samples.” Bowling Green State University, Department of Mathematics and Statistics, Technical Report 02-16 (2002).

[3]

“Energy distance”, https://en.wikipedia.org/wiki/Energy_distance

[4]

Bellemare, Danihelka, Dabney, Mohamed, Lakshminarayanan, Hoyer, Munos “The Cramer Distance as a Solution to Biased Wasserstein Gradients” (2017). arXiv:1705.10743.

Examples

>>> from scipy.stats import energy_distance
>>> energy_distance([0], [2])
2.0000000000000004
>>> energy_distance([0, 8], [0, 8], [3, 1], [2, 2])
1.0000000000000002
>>> energy_distance([0.7, 7.4, 2.4, 6.8], [1.4, 8. ],
...                 [2.1, 4.2, 7.4, 8. ], [7.6, 8.8])
0.88003340976158217