scipy.stats.NumericalInverseHermite¶
-
class
scipy.stats.NumericalInverseHermite(dist, *, tol=1e-12, max_intervals=100000)[source]¶ A Hermite spline fast numerical inverse of a probability distribution.
The initializer of
NumericalInverseHermiteaccepts dist, an object representing a continuous distribution, and provides an object with methods that approximate dist.ppf and dist.rvs. For most distributions, these methods are faster than those of dist itself.- Parameters
- distobject
Object representing the distribution for which a fast numerical inverse is desired; for instance, a frozen instance of a
scipy.statscontinuous distribution. See Notes and Examples for details.- tolfloat, optional
u-error tolerance (see Notes). The default is 1e-12.
- max_intervalsint, optional
Maximum number of intervals in the cubic Hermite spline used to approximate the percent point function. The default is 100000.
Notes
NumericalInverseHermiteapproximates the inverse of a continuous statistical distribution’s CDF with a cubic Hermite spline.As described in [1], it begins by evaluating the distribution’s PDF and CDF at a mesh of quantiles
xwithin the distribution’s support. It uses the results to fit a cubic Hermite splineHsuch thatH(p) == x, wherepis the array of percentiles corresponding with the quantilesx. Therefore, the spline approximates the inverse of the distribution’s CDF to machine precision at the percentilesp, but typically, the spline will not be as accurate at the midpoints between the percentile points:p_mid = (p[:-1] + p[1:])/2
so the mesh of quantiles is refined as needed to reduce the maximum “u-error”:
u_error = np.max(np.abs(dist.cdf(H(p_mid)) - p_mid))
below the specified tolerance tol. Refinement stops when the required tolerance is achieved or when the number of mesh intervals after the next refinement could exceed the maximum allowed number max_intervals.
The object dist must have methods
pdf,cdf, andppfthat behave like those of a frozen instance ofscipy.stats.rv_continuous. Specifically, it must have methodspdfandcdfthat accept exactly one ndarray argumentxand return the probability density function and cumulative density function (respectively) atx. The object must also have a methodppfthat accepts a floatpand returns the percentile point function atp. The object may also have a methodisfthat accepts a floatpand returns the inverse survival function atp; if it does not, it will be assigned an attributeisfthat calculates the inverse survival function usingppf. Theppfandisf` methods will each be evaluated at a small positive float ``p(e.g.p = utol/10), and the domain over which the approximate numerical inverse is defined will beppf(p)toisf(p). The approximation will not be accurate in the extreme tails beyond this domain.References
- 1
Hörmann, Wolfgang, and Josef Leydold. “Continuous random variate generation by fast numerical inversion.” ACM Transactions on Modeling and Computer Simulation (TOMACS) 13.4 (2003): 347-362.
Examples
For some distributions,
dist.ppfanddist.rvsare quite slow. For instance, considerscipy.stats.genexpon. We freeze the distribution by passing all shape parameters into its initializer and time the resulting object’sppfandrvsfunctions.>>> import numpy as np >>> from scipy import stats >>> from timeit import timeit >>> time_once = lambda f: f"{timeit(f, number=1)*1000:.6} ms" >>> dist = stats.genexpon(9, 16, 3) # freeze the distribution >>> p = np.linspace(0.01, 0.99, 99) # percentiles from 1% to 99% >>> time_once(lambda: dist.ppf(p)) '154.565 ms' # may vary
>>> time_once(lambda: dist.rvs(size=100)) '148.979 ms' # may vary
The
NumericalInverseHermitehas a method that approximatesdist.ppf.>>> from scipy.stats import NumericalInverseHermite >>> fni = NumericalInverseHermite(dist) >>> np.allclose(fni.ppf(p), dist.ppf(p)) True
In some cases, it is faster to both generate the fast numerical inverse and use it than to call
dist.ppf.>>> def time_me(): ... fni = NumericalInverseHermite(dist) ... fni.ppf(p) >>> time_once(time_me) '11.9222 ms' # may vary
After generating the fast numerical inverse, subsequent calls to its methods are much faster. >>> time_once(lambda: fni.ppf(p)) ‘0.0819 ms’ # may vary
The fast numerical inverse can also be used to generate random variates using inverse transform sampling.
>>> time_once(lambda: fni.rvs(size=100)) '0.0911 ms' # may vary
Depending on the implementation of the distribution’s random sampling method, the random variates generated may be nearly identical, given the same random state.
>>> # `seed` ensures identical random streams are used by each `rvs` method >>> seed = 500072020 >>> rvs1 = dist.rvs(size=100, random_state=np.random.default_rng(seed)) >>> rvs2 = fni.rvs(size=100, random_state=np.random.default_rng(seed)) >>> np.allclose(rvs1, rvs2) True
To use
NumericalInverseHermitewith a custom distribution, users may subclassscipy.stats.rv_continuousand initialize a frozen instance or create an object with equivalentpdf,cdf, andppfmethods. For instance, the following object represents the standard normal distribution. For simplicity, we usescipy.special.ndtrandscipy.special.ndtrito compute thecdfandppf, respectively.>>> from scipy.special import ndtr, ndtri >>> >>> class MyNormal: ... ... def pdf(self, x): ... return 1/np.sqrt(2*np.pi) * np.exp(-x**2 / 2) ... ... def cdf(self, x): ... return ndtr(x) ... ... def ppf(self, x): ... return ndtri(x) ... >>> dist1 = MyNormal() >>> fni1 = NumericalInverseHermite(dist1) >>> >>> dist2 = stats.norm() >>> fni2 = NumericalInverseHermite(dist2) >>> >>> print(fni1.rvs(random_state=seed), fni2.rvs(random_state=seed)) -1.9603810921759424 -1.9603810921747074
- Attributes
- intervalsint
The number of intervals of the interpolant.
- midpoint_errorfloat
The maximum u-error at an interpolant interval midpoint.
Methods
ppf(q)Approximate percent point function (inverse cdf) of the given RV.
qrvs([size, d, qmc_engine])Quasi-random variates of the given RV.
rvs([size, random_state])Random variates of the given RV.