scipy.stats.sampling.

FastGeneratorInversion#

class scipy.stats.sampling.FastGeneratorInversion(dist, *, domain=None, ignore_shape_range=False, random_state=None)[source]#

Fast sampling by numerical inversion of the CDF for a large class of continuous distributions in scipy.stats.

Parameters:
distrv_frozen object

Frozen distribution object from scipy.stats. The list of supported distributions can be found in the Notes section. The shape parameters, loc and scale used to create the distributions must be scalars. For example, for the Gamma distribution with shape parameter p, p has to be a float, and for the beta distribution with shape parameters (a, b), both a and b have to be floats.

domaintuple of floats, optional

If one wishes to sample from a truncated/conditional distribution, the domain has to be specified. The default is None. In that case, the random variates are not truncated, and the domain is inferred from the support of the distribution.

ignore_shape_rangeboolean, optional.

If False, shape parameters that are outside of the valid range of values to ensure that the numerical accuracy (see Notes) is high, raise a ValueError. If True, any shape parameters that are valid for the distribution are accepted. This can be useful for testing. The default is False.

random_state{None, int, numpy.random.Generator,

A NumPy random number generator or seed for the underlying NumPy random number generator used to generate the stream of uniform random numbers. If random_state is None, it uses self.random_state. If random_state is an int, np.random.default_rng(random_state) is used. If random_state is already a Generator or RandomState instance then that instance is used.

Attributes:
locfloat

The location parameter.

random_state{numpy.random.Generator, numpy.random.RandomState}

The random state used in relevant methods like rvs (unless another random_state is passed as an argument to these methods).

scalefloat

The scale parameter.

Methods

evaluate_error([size, random_state, x_error])

Evaluate the numerical accuracy of the inversion (u- and x-error).

ppf(q)

Very fast PPF (inverse CDF) of the distribution which is a very close approximation of the exact PPF values.

qrvs([size, d, qmc_engine])

Quasi-random variates of the given distribution.

rvs([size])

Sample from the distribution by inversion.

support()

Support of the distribution.

cdf

Notes

The class creates an object for continuous distributions specified by dist. The method rvs uses a generator from scipy.stats.sampling that is created when the object is instantiated. In addition, the methods qrvs and ppf are added. qrvs generate samples based on quasi-random numbers from scipy.stats.qmc. ppf is the PPF based on the numerical inversion method in [1] (NumericalInversePolynomial) that is used to generate random variates.

Supported distributions (distname) are: alpha, anglit, argus, beta, betaprime, bradford, burr, burr12, cauchy, chi, chi2, cosine, crystalball, expon, gamma, gennorm, geninvgauss, gumbel_l, gumbel_r, hypsecant, invgamma, invgauss, invweibull, laplace, logistic, maxwell, moyal, norm, pareto, powerlaw, t, rayleigh, semicircular, wald, weibull_max, weibull_min.

rvs relies on the accuracy of the numerical inversion. If very extreme shape parameters are used, the numerical inversion might not work. However, for all implemented distributions, the admissible shape parameters have been tested, and an error will be raised if the user supplies values outside of the allowed range. The u-error should not exceed 1e-10 for all valid parameters. Note that warnings might be raised even if parameters are within the valid range when the object is instantiated. To check numerical accuracy, the method evaluate_error can be used.

Note that all implemented distributions are also part of scipy.stats, and the object created by FastGeneratorInversion relies on methods like ppf, cdf and pdf from rv_frozen. The main benefit of using this class can be summarized as follows: Once the generator to sample random variates is created in the setup step, sampling and evaluation of the PPF using ppf are very fast, and performance is essentially independent of the distribution. Therefore, a substantial speed-up can be achieved for many distributions if large numbers of random variates are required. It is important to know that this fast sampling is achieved by inversion of the CDF. Thus, one uniform random variate is transformed into a non-uniform variate, which is an advantage for several simulation methods, e.g., when the variance reduction methods of common random variates or antithetic variates are be used ([2]).

In addition, inversion makes it possible to - to use a QMC generator from scipy.stats.qmc (method qrvs), - to generate random variates truncated to an interval. For example, if one aims to sample standard normal random variates from the interval (2, 4), this can be easily achieved by using the parameter domain.

The location and scale that are initially defined by dist can be reset without having to rerun the setup step to create the generator that is used for sampling. The relation of the distribution Y with loc and scale to the standard distribution X (i.e., loc=0 and scale=1) is given by Y = loc + scale * X.

References

[1]

Derflinger, Gerhard, Wolfgang Hörmann, and Josef Leydold. “Random variate generation by numerical inversion when only the density is known.” ACM Transactions on Modeling and Computer Simulation (TOMACS) 20.4 (2010): 1-25.

[2]

Hörmann, Wolfgang, Josef Leydold and Gerhard Derflinger. “Automatic nonuniform random number generation.” Springer, 2004.

Examples

>>> import numpy as np
>>> from scipy import stats
>>> from scipy.stats.sampling import FastGeneratorInversion

Let’s start with a simple example to illustrate the main features:

>>> gamma_frozen = stats.gamma(1.5)
>>> gamma_dist = FastGeneratorInversion(gamma_frozen)
>>> r = gamma_dist.rvs(size=1000)

The mean should be approximately equal to the shape parameter 1.5:

>>> r.mean()
1.52423591130436  # may vary

Similarly, we can draw a sample based on quasi-random numbers:

>>> r = gamma_dist.qrvs(size=1000)
>>> r.mean()
1.4996639255942914  # may vary

Compare the PPF against approximation ppf.

>>> q = [0.001, 0.2, 0.5, 0.8, 0.999]
>>> np.max(np.abs(gamma_frozen.ppf(q) - gamma_dist.ppf(q)))
4.313394796895409e-08

To confirm that the numerical inversion is accurate, we evaluate the approximation error (u-error), which should be below 1e-10 (for more details, refer to the documentation of evaluate_error):

>>> gamma_dist.evaluate_error()
(7.446320551265581e-11, nan)  # may vary

Note that the location and scale can be changed without instantiating a new generator:

>>> gamma_dist.loc = 2
>>> gamma_dist.scale = 3
>>> r = gamma_dist.rvs(size=1000)

The mean should be approximately 2 + 3*1.5 = 6.5.

>>> r.mean()
6.399549295242894  # may vary

Let us also illustrate how truncation can be applied:

>>> trunc_norm = FastGeneratorInversion(stats.norm(), domain=(3, 4))
>>> r = trunc_norm.rvs(size=1000)
>>> 3 < r.min() < r.max() < 4
True

Check the mean:

>>> r.mean()
3.250433367078603  # may vary
>>> stats.norm.expect(lb=3, ub=4, conditional=True)
3.260454285589997

In this particular, case, scipy.stats.truncnorm could also be used to generate truncated normal random variates.