scipy.stats.rv_histogram#

class scipy.stats.rv_histogram(histogram, *args, density=None, **kwargs)[source]#

Generates a distribution given by a histogram. This is useful to generate a template distribution from a binned datasample.

As a subclass of the rv_continuous class, rv_histogram inherits from it a collection of generic methods (see rv_continuous for the full list), and implements them based on the properties of the provided binned datasample.

Parameters:
histogramtuple of array_like

Tuple containing two array_like objects. The first containing the content of n bins, the second containing the (n+1) bin boundaries. In particular, the return value of numpy.histogram is accepted.

densitybool, optional

If False, assumes the histogram is proportional to counts per bin; otherwise, assumes it is proportional to a density. For constant bin widths, these are equivalent, but the distinction is important when bin widths vary (see Notes). If None (default), sets density=True for backwards compatibility, but warns if the bin widths are variable. Set density explicitly to silence the warning.

New in version 1.10.0.

Notes

When a histogram has unequal bin widths, there is a distinction between histograms that are proportional to counts per bin and histograms that are proportional to probability density over a bin. If numpy.histogram is called with its default density=False, the resulting histogram is the number of counts per bin, so density=False should be passed to rv_histogram. If numpy.histogram is called with density=True, the resulting histogram is in terms of probability density, so density=True should be passed to rv_histogram. To avoid warnings, always pass density explicitly when the input histogram has unequal bin widths.

There are no additional shape parameters except for the loc and scale. The pdf is defined as a stepwise function from the provided histogram. The cdf is a linear interpolation of the pdf.

New in version 0.19.0.

Examples

Create a scipy.stats distribution from a numpy histogram

>>> import scipy.stats
>>> import numpy as np
>>> data = scipy.stats.norm.rvs(size=100000, loc=0, scale=1.5, random_state=123)
>>> hist = np.histogram(data, bins=100)
>>> hist_dist = scipy.stats.rv_histogram(hist, density=False)

Behaves like an ordinary scipy rv_continuous distribution

>>> hist_dist.pdf(1.0)
0.20538577847618705
>>> hist_dist.cdf(2.0)
0.90818568543056499

PDF is zero above (below) the highest (lowest) bin of the histogram, defined by the max (min) of the original dataset

>>> hist_dist.pdf(np.max(data))
0.0
>>> hist_dist.cdf(np.max(data))
1.0
>>> hist_dist.pdf(np.min(data))
7.7591907244498314e-05
>>> hist_dist.cdf(np.min(data))
0.0

PDF and CDF follow the histogram

>>> import matplotlib.pyplot as plt
>>> X = np.linspace(-5.0, 5.0, 100)
>>> fig, ax = plt.subplots()
>>> ax.set_title("PDF from Template")
>>> ax.hist(data, density=True, bins=100)
>>> ax.plot(X, hist_dist.pdf(X), label='PDF')
>>> ax.plot(X, hist_dist.cdf(X), label='CDF')
>>> ax.legend()
>>> fig.show()
../../_images/scipy-stats-rv_histogram-1.png
Attributes:
random_state

Get or set the generator object for generating random variates.

Methods

__call__(*args, **kwds)

Freeze the distribution for the given arguments.

cdf(x, *args, **kwds)

Cumulative distribution function of the given RV.

entropy(*args, **kwds)

Differential entropy of the RV.

expect([func, args, loc, scale, lb, ub, ...])

Calculate expected value of a function with respect to the distribution by numerical integration.

fit(data, *args, **kwds)

Return estimates of shape (if applicable), location, and scale parameters from data.

fit_loc_scale(data, *args)

Estimate loc and scale parameters from data using 1st and 2nd moments.

freeze(*args, **kwds)

Freeze the distribution for the given arguments.

interval([confidence])

Confidence interval with equal areas around the median.

isf(q, *args, **kwds)

Inverse survival function (inverse of sf) at q of the given RV.

logcdf(x, *args, **kwds)

Log of the cumulative distribution function at x of the given RV.

logpdf(x, *args, **kwds)

Log of the probability density function at x of the given RV.

logsf(x, *args, **kwds)

Log of the survival function of the given RV.

mean(*args, **kwds)

Mean of the distribution.

median(*args, **kwds)

Median of the distribution.

moment([order])

non-central moment of distribution of specified order.

nnlf(theta, x)

Negative loglikelihood function.

pdf(x, *args, **kwds)

Probability density function at x of the given RV.

ppf(q, *args, **kwds)

Percent point function (inverse of cdf) at q of the given RV.

rvs(*args, **kwds)

Random variates of given type.

sf(x, *args, **kwds)

Survival function (1 - cdf) at x of the given RV.

stats(*args, **kwds)

Some statistics of the given RV.

std(*args, **kwds)

Standard deviation of the distribution.

support(*args, **kwargs)

Support of the distribution.

var(*args, **kwds)

Variance of the distribution.