scipy.stats.NumericalInverseHermite¶
- class scipy.stats.NumericalInverseHermite(dist, *, tol=1e-12, max_intervals=100000)[source]¶
A Hermite spline fast numerical inverse of a probability distribution.
The initializer of
NumericalInverseHermite
accepts dist, an object representing a continuous distribution, and provides an object with methods that approximate dist.ppf and dist.rvs. For most distributions, these methods are faster than those of dist itself.- Parameters
- distobject
Object representing the distribution for which a fast numerical inverse is desired; for instance, a frozen instance of a
scipy.stats
continuous distribution. See Notes and Examples for details.- tolfloat, optional
u-error tolerance (see Notes). The default is 1e-12.
- max_intervalsint, optional
Maximum number of intervals in the cubic Hermite spline used to approximate the percent point function. The default is 100000.
Notes
NumericalInverseHermite
approximates the inverse of a continuous statistical distribution’s CDF with a cubic Hermite spline.As described in [1], it begins by evaluating the distribution’s PDF and CDF at a mesh of quantiles
x
within the distribution’s support. It uses the results to fit a cubic Hermite splineH
such thatH(p) == x
, wherep
is the array of percentiles corresponding with the quantilesx
. Therefore, the spline approximates the inverse of the distribution’s CDF to machine precision at the percentilesp
, but typically, the spline will not be as accurate at the midpoints between the percentile points:p_mid = (p[:-1] + p[1:])/2
so the mesh of quantiles is refined as needed to reduce the maximum “u-error”:
u_error = np.max(np.abs(dist.cdf(H(p_mid)) - p_mid))
below the specified tolerance tol. Refinement stops when the required tolerance is achieved or when the number of mesh intervals after the next refinement could exceed the maximum allowed number max_intervals.
The object dist must have methods
pdf
,cdf
, andppf
that behave like those of a frozen instance ofscipy.stats.rv_continuous
. Specifically, it must have methodspdf
andcdf
that accept exactly one ndarray argumentx
and return the probability density function and cumulative density function (respectively) atx
. The object must also have a methodppf
that accepts a floatp
and returns the percentile point function atp
. The object may also have a methodisf
that accepts a floatp
and returns the inverse survival function atp
; if it does not, it will be assigned an attributeisf
that calculates the inverse survival function usingppf
. Theppf
andisf` methods will each be evaluated at a small positive float ``p
(e.g.p = utol/10
), and the domain over which the approximate numerical inverse is defined will beppf(p)
toisf(p)
. The approximation will not be accurate in the extreme tails beyond this domain.References
- 1
Hörmann, Wolfgang, and Josef Leydold. “Continuous random variate generation by fast numerical inversion.” ACM Transactions on Modeling and Computer Simulation (TOMACS) 13.4 (2003): 347-362.
Examples
For some distributions,
dist.ppf
anddist.rvs
are quite slow. For instance, considerscipy.stats.genexpon
. We freeze the distribution by passing all shape parameters into its initializer and time the resulting object’sppf
andrvs
functions.>>> import numpy as np >>> from scipy import stats >>> from timeit import timeit >>> time_once = lambda f: f"{timeit(f, number=1)*1000:.6} ms" >>> dist = stats.genexpon(9, 16, 3) # freeze the distribution >>> p = np.linspace(0.01, 0.99, 99) # percentiles from 1% to 99% >>> time_once(lambda: dist.ppf(p)) '154.565 ms' # may vary
>>> time_once(lambda: dist.rvs(size=100)) '148.979 ms' # may vary
The
NumericalInverseHermite
has a method that approximatesdist.ppf
.>>> from scipy.stats import NumericalInverseHermite >>> fni = NumericalInverseHermite(dist) >>> np.allclose(fni.ppf(p), dist.ppf(p)) True
In some cases, it is faster to both generate the fast numerical inverse and use it than to call
dist.ppf
.>>> def time_me(): ... fni = NumericalInverseHermite(dist) ... fni.ppf(p) >>> time_once(time_me) '11.9222 ms' # may vary
After generating the fast numerical inverse, subsequent calls to its methods are much faster. >>> time_once(lambda: fni.ppf(p)) ‘0.0819 ms’ # may vary
The fast numerical inverse can also be used to generate random variates using inverse transform sampling.
>>> time_once(lambda: fni.rvs(size=100)) '0.0911 ms' # may vary
Depending on the implementation of the distribution’s random sampling method, the random variates generated may be nearly identical, given the same random state.
>>> # `seed` ensures identical random streams are used by each `rvs` method >>> seed = 500072020 >>> rvs1 = dist.rvs(size=100, random_state=np.random.default_rng(seed)) >>> rvs2 = fni.rvs(size=100, random_state=np.random.default_rng(seed)) >>> np.allclose(rvs1, rvs2) True
To use
NumericalInverseHermite
with a custom distribution, users may subclassscipy.stats.rv_continuous
and initialize a frozen instance or create an object with equivalentpdf
,cdf
, andppf
methods. For instance, the following object represents the standard normal distribution. For simplicity, we usescipy.special.ndtr
andscipy.special.ndtri
to compute thecdf
andppf
, respectively.>>> from scipy.special import ndtr, ndtri >>> >>> class MyNormal: ... ... def pdf(self, x): ... return 1/np.sqrt(2*np.pi) * np.exp(-x**2 / 2) ... ... def cdf(self, x): ... return ndtr(x) ... ... def ppf(self, x): ... return ndtri(x) ... >>> dist1 = MyNormal() >>> fni1 = NumericalInverseHermite(dist1) >>> >>> dist2 = stats.norm() >>> fni2 = NumericalInverseHermite(dist2) >>> >>> print(fni1.rvs(random_state=seed), fni2.rvs(random_state=seed)) -1.9603810921759424 -1.9603810921747074
- Attributes
- intervalsint
The number of intervals of the interpolant.
- midpoint_errorfloat
The maximum u-error at an interpolant interval midpoint.
Methods
ppf
(q)Approximate percent point function (inverse cdf) of the given RV.
qrvs
([size, d, qmc_engine])Quasi-random variates of the given RV.
rvs
([size, random_state])Random variates of the given RV.