scipy.stats.qmc.LatinHypercube#
- class scipy.stats.qmc.LatinHypercube(d, *, centered=False, scramble=True, strength=1, optimization=None, seed=None)[source]#
Latin hypercube sampling (LHS).
A Latin hypercube sample [1] generates \(n\) points in \([0,1)^{d}\). Each univariate marginal distribution is stratified, placing exactly one point in \([j/n, (j+1)/n)\) for \(j=0,1,...,n-1\). They are still applicable when \(n << d\).
- Parameters:
- dint
Dimension of the parameter space.
- centeredbool, optional
Center samples within cells of a multi-dimensional grid. Default is False.
Deprecated since version 1.10.0: centered is deprecated as of SciPy 1.10.0 and will be removed in 1.12.0. Use scramble instead.
centered=True
corresponds toscramble=False
.- scramblebool, optional
When False, center samples within cells of a multi-dimensional grid. Otherwise, samples are randomly placed within cells of the grid.
Note
Setting
scramble=False
does not ensure deterministic output. For that, use the seed parameter.Default is True.
New in version 1.10.0.
- optimization{None, “random-cd”, “lloyd”}, optional
Whether to use an optimization scheme to improve the quality after sampling. Note that this is a post-processing step that does not guarantee that all properties of the sample will be conserved. Default is None.
random-cd
: random permutations of coordinates to lower the centered discrepancy. The best sample based on the centered discrepancy is constantly updated. Centered discrepancy-based sampling shows better space-filling robustness toward 2D and 3D subprojections compared to using other discrepancy measures.lloyd
: Perturb samples using a modified Lloyd-Max algorithm. The process converges to equally spaced samples.
New in version 1.8.0.
Changed in version 1.10.0: Add
lloyd
.- strength{1, 2}, optional
Strength of the LHS.
strength=1
produces a plain LHS whilestrength=2
produces an orthogonal array based LHS of strength 2 [7], [8]. In that case, onlyn=p**2
points can be sampled, withp
a prime number. It also constrainsd <= p + 1
. Default is 1.New in version 1.8.0.
- seed{None, int,
numpy.random.Generator
}, optional If seed is an int or None, a new
numpy.random.Generator
is created usingnp.random.default_rng(seed)
. If seed is already aGenerator
instance, then the provided instance is used.
Notes
When LHS is used for integrating a function \(f\) over \(n\), LHS is extremely effective on integrands that are nearly additive [2]. With a LHS of \(n\) points, the variance of the integral is always lower than plain MC on \(n-1\) points [3]. There is a central limit theorem for LHS on the mean and variance of the integral [4], but not necessarily for optimized LHS due to the randomization.
\(A\) is called an orthogonal array of strength \(t\) if in each n-row-by-t-column submatrix of \(A\): all \(p^t\) possible distinct rows occur the same number of times. The elements of \(A\) are in the set \(\{0, 1, ..., p-1\}\), also called symbols. The constraint that \(p\) must be a prime number is to allow modular arithmetic. Increasing strength adds some symmetry to the sub-projections of a sample. With strength 2, samples are symmetric along the diagonals of 2D sub-projections. This may be undesirable, but on the other hand, the sample dispersion is improved.
Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and strength 2 is a useful increment over strength 1. Going to strength 3 is a smaller increment and scrambled QMC like Sobol’, Halton are more performant [7].
To create a LHS of strength 2, the orthogonal array \(A\) is randomized by applying a random, bijective map of the set of symbols onto itself. For example, in column 0, all 0s might become 2; in column 1, all 0s might become 1, etc. Then, for each column \(i\) and symbol \(j\), we add a plain, one-dimensional LHS of size \(p\) to the subarray where \(A^i = j\). The resulting matrix is finally divided by \(p\).
References
[1]Mckay et al., “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code.” Technometrics, 1979.
[2]M. Stein, “Large sample properties of simulations using Latin hypercube sampling.” Technometrics 29, no. 2: 143-151, 1987.
[3]A. B. Owen, “Monte Carlo variance of scrambled net quadrature.” SIAM Journal on Numerical Analysis 34, no. 5: 1884-1910, 1997
[4]Loh, W.-L. “On Latin hypercube sampling.” The annals of statistics 24, no. 5: 2058-2080, 1996.
[5]Fang et al. “Design and modeling for computer experiments”. Computer Science and Data Analysis Series, 2006.
[6]Damblin et al., “Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties.” Journal of Simulation, 2013.
[7] (1,2)A. B. Owen , “Orthogonal arrays for computer experiments, integration and visualization.” Statistica Sinica, 1992.
[8]B. Tang, “Orthogonal Array-Based Latin Hypercubes.” Journal of the American Statistical Association, 1993.
Examples
Generate samples from a Latin hypercube generator.
>>> from scipy.stats import qmc >>> sampler = qmc.LatinHypercube(d=2) >>> sample = sampler.random(n=5) >>> sample array([[0.1545328 , 0.53664833], # random [0.84052691, 0.06474907], [0.52177809, 0.93343721], [0.68033825, 0.36265316], [0.26544879, 0.61163943]])
Compute the quality of the sample using the discrepancy criterion.
>>> qmc.discrepancy(sample) 0.0196... # random
Samples can be scaled to bounds.
>>> l_bounds = [0, 2] >>> u_bounds = [10, 5] >>> qmc.scale(sample, l_bounds, u_bounds) array([[1.54532796, 3.609945 ], # random [8.40526909, 2.1942472 ], [5.2177809 , 4.80031164], [6.80338249, 3.08795949], [2.65448791, 3.83491828]])
Use the optimization keyword argument to produce a LHS with lower discrepancy at higher computational cost.
>>> sampler = qmc.LatinHypercube(d=2, optimization="random-cd") >>> sample = sampler.random(n=5) >>> qmc.discrepancy(sample) 0.0176... # random
Use the strength keyword argument to produce an orthogonal array based LHS of strength 2. In this case, the number of sample points must be the square of a prime number.
>>> sampler = qmc.LatinHypercube(d=2, strength=2) >>> sample = sampler.random(n=9) >>> qmc.discrepancy(sample) 0.00526... # random
Options could be combined to produce an optimized centered orthogonal array based LHS. After optimization, the result would not be guaranteed to be of strength 2.
Methods
fast_forward
(n)Fast-forward the sequence by n positions.
integers
(l_bounds, *[, u_bounds, n, ...])Draw n integers from l_bounds (inclusive) to u_bounds (exclusive), or if endpoint=True, l_bounds (inclusive) to u_bounds (inclusive).
random
([n, workers])Draw n in the half-open interval
[0, 1)
.reset
()Reset the engine to base state.