scipy.stats.entropy#

scipy.stats.entropy(pk, qk=None, base=None, axis=0, *, nan_policy='propagate', keepdims=False)[source]#

Calculate the Shannon entropy/relative entropy of given distribution(s).

If only probabilities pk are given, the Shannon entropy is calculated as H = -sum(pk * log(pk)).

If qk is not None, then compute the relative entropy D = sum(pk * log(pk / qk)). This quantity is also known as the Kullback-Leibler divergence.

This routine will normalize pk and qk if they don’t sum to 1.

Parameters:
pkarray_like

Defines the (discrete) distribution. Along each axis-slice of pk, element i is the (possibly unnormalized) probability of event i.

qkarray_like, optional

Sequence against which the relative entropy is computed. Should be in the same format as pk.

basefloat, optional

The logarithmic base to use, defaults to e (natural logarithm).

axisint or None, default: 0

If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None, the input will be raveled before computing the statistic.

nan_policy{‘propagate’, ‘omit’, ‘raise’}

Defines how to handle input NaNs.

  • propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.

  • omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.

  • raise: if a NaN is present, a ValueError will be raised.

keepdimsbool, default: False

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Returns:
S{float, array_like}

The calculated entropy.

Notes

Informally, the Shannon entropy quantifies the expected uncertainty inherent in the possible outcomes of a discrete random variable. For example, if messages consisting of sequences of symbols from a set are to be encoded and transmitted over a noiseless channel, then the Shannon entropy H(pk) gives a tight lower bound for the average number of units of information needed per symbol if the symbols occur with frequencies governed by the discrete distribution pk [1]. The choice of base determines the choice of units; e.g., e for nats, 2 for bits, etc.

The relative entropy, D(pk|qk), quantifies the increase in the average number of units of information needed per symbol if the encoding is optimized for the probability distribution qk instead of the true distribution pk. Informally, the relative entropy quantifies the expected excess in surprise experienced if one believes the true distribution is qk when it is actually pk.

A related quantity, the cross entropy CE(pk, qk), satisfies the equation CE(pk, qk) = H(pk) + D(pk|qk) and can also be calculated with the formula CE = -sum(pk * log(qk)). It gives the average number of units of information needed per symbol if an encoding is optimized for the probability distribution qk when the true distribution is pk. It is not computed directly by entropy, but it can be computed using two calls to the function (see Examples).

See [2] for more information.

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

References

[1]

Shannon, C.E. (1948), A Mathematical Theory of Communication. Bell System Technical Journal, 27: 379-423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

[2]

Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA.

Examples

The outcome of a fair coin is the most uncertain:

>>> import numpy as np
>>> from scipy.stats import entropy
>>> base = 2  # work in units of bits
>>> pk = np.array([1/2, 1/2])  # fair coin
>>> H = entropy(pk, base=base)
>>> H
1.0
>>> H == -np.sum(pk * np.log(pk)) / np.log(base)
True

The outcome of a biased coin is less uncertain:

>>> qk = np.array([9/10, 1/10])  # biased coin
>>> entropy(qk, base=base)
0.46899559358928117

The relative entropy between the fair coin and biased coin is calculated as:

>>> D = entropy(pk, qk, base=base)
>>> D
0.7369655941662062
>>> D == np.sum(pk * np.log(pk/qk)) / np.log(base)
True

The cross entropy can be calculated as the sum of the entropy and relative entropy`:

>>> CE = entropy(pk, base=base) + entropy(pk, qk, base=base)
>>> CE
1.736965594166206
>>> CE == -np.sum(pk * np.log(qk)) / np.log(base)
True