scipy.stats.qmc.LatinHypercube#

class scipy.stats.qmc.LatinHypercube(d, *, centered=False, scramble=True, strength=1, optimization=None, seed=None)[source]#

Latin hypercube sampling (LHS).

A Latin hypercube sample [1] generates \(n\) points in \([0,1)^{d}\). Each univariate marginal distribution is stratified, placing exactly one point in \([j/n, (j+1)/n)\) for \(j=0,1,...,n-1\). They are still applicable when \(n << d\).

Parameters:
dint

Dimension of the parameter space.

centeredbool, optional

Center samples within cells of a multi-dimensional grid. Default is False.

Deprecated since version 1.10.0: centered is deprecated as of SciPy 1.10.0 and will be removed in 1.12.0. Use scramble instead. centered=True corresponds to scramble=False.

scramblebool, optional

When False, center samples within cells of a multi-dimensional grid. Otherwise, samples are randomly placed within cells of the grid.

Note

Setting scramble=False does not ensure deterministic output. For that, use the seed parameter.

Default is True.

New in version 1.10.0.

optimization{None, “random-cd”, “lloyd”}, optional

Whether to use an optimization scheme to improve the quality after sampling. Note that this is a post-processing step that does not guarantee that all properties of the sample will be conserved. Default is None.

  • random-cd: random permutations of coordinates to lower the centered discrepancy. The best sample based on the centered discrepancy is constantly updated. Centered discrepancy-based sampling shows better space-filling robustness toward 2D and 3D subprojections compared to using other discrepancy measures.

  • lloyd: Perturb samples using a modified Lloyd-Max algorithm. The process converges to equally spaced samples.

New in version 1.8.0.

Changed in version 1.10.0: Add lloyd.

strength{1, 2}, optional

Strength of the LHS. strength=1 produces a plain LHS while strength=2 produces an orthogonal array based LHS of strength 2 [7], [8]. In that case, only n=p**2 points can be sampled, with p a prime number. It also constrains d <= p + 1. Default is 1.

New in version 1.8.0.

seed{None, int, numpy.random.Generator}, optional

If seed is an int or None, a new numpy.random.Generator is created using np.random.default_rng(seed). If seed is already a Generator instance, then the provided instance is used.

Notes

When LHS is used for integrating a function \(f\) over \(n\), LHS is extremely effective on integrands that are nearly additive [2]. With a LHS of \(n\) points, the variance of the integral is always lower than plain MC on \(n-1\) points [3]. There is a central limit theorem for LHS on the mean and variance of the integral [4], but not necessarily for optimized LHS due to the randomization.

\(A\) is called an orthogonal array of strength \(t\) if in each n-row-by-t-column submatrix of \(A\): all \(p^t\) possible distinct rows occur the same number of times. The elements of \(A\) are in the set \(\{0, 1, ..., p-1\}\), also called symbols. The constraint that \(p\) must be a prime number is to allow modular arithmetic. Increasing strength adds some symmetry to the sub-projections of a sample. With strength 2, samples are symmetric along the diagonals of 2D sub-projections. This may be undesirable, but on the other hand, the sample dispersion is improved.

Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and strength 2 is a useful increment over strength 1. Going to strength 3 is a smaller increment and scrambled QMC like Sobol’, Halton are more performant [7].

To create a LHS of strength 2, the orthogonal array \(A\) is randomized by applying a random, bijective map of the set of symbols onto itself. For example, in column 0, all 0s might become 2; in column 1, all 0s might become 1, etc. Then, for each column \(i\) and symbol \(j\), we add a plain, one-dimensional LHS of size \(p\) to the subarray where \(A^i = j\). The resulting matrix is finally divided by \(p\).

References

[1]

Mckay et al., “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code.” Technometrics, 1979.

[2]

M. Stein, “Large sample properties of simulations using Latin hypercube sampling.” Technometrics 29, no. 2: 143-151, 1987.

[3]

A. B. Owen, “Monte Carlo variance of scrambled net quadrature.” SIAM Journal on Numerical Analysis 34, no. 5: 1884-1910, 1997

[4]

Loh, W.-L. “On Latin hypercube sampling.” The annals of statistics 24, no. 5: 2058-2080, 1996.

[5]

Fang et al. “Design and modeling for computer experiments”. Computer Science and Data Analysis Series, 2006.

[6]

Damblin et al., “Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties.” Journal of Simulation, 2013.

[7] (1,2)

A. B. Owen , “Orthogonal arrays for computer experiments, integration and visualization.” Statistica Sinica, 1992.

[8]

B. Tang, “Orthogonal Array-Based Latin Hypercubes.” Journal of the American Statistical Association, 1993.

Examples

Generate samples from a Latin hypercube generator.

>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> sample
array([[0.1545328 , 0.53664833],  # random
       [0.84052691, 0.06474907],
       [0.52177809, 0.93343721],
       [0.68033825, 0.36265316],
       [0.26544879, 0.61163943]])

Compute the quality of the sample using the discrepancy criterion.

>>> qmc.discrepancy(sample)
0.0196...  # random

Samples can be scaled to bounds.

>>> l_bounds = [0, 2]
>>> u_bounds = [10, 5]
>>> qmc.scale(sample, l_bounds, u_bounds)
array([[1.54532796, 3.609945  ],  # random
       [8.40526909, 2.1942472 ],
       [5.2177809 , 4.80031164],
       [6.80338249, 3.08795949],
       [2.65448791, 3.83491828]])

Use the optimization keyword argument to produce a LHS with lower discrepancy at higher computational cost.

>>> sampler = qmc.LatinHypercube(d=2, optimization="random-cd")
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0176...  # random

Use the strength keyword argument to produce an orthogonal array based LHS of strength 2. In this case, the number of sample points must be the square of a prime number.

>>> sampler = qmc.LatinHypercube(d=2, strength=2)
>>> sample = sampler.random(n=9)
>>> qmc.discrepancy(sample)
0.00526...  # random

Options could be combined to produce an optimized centered orthogonal array based LHS. After optimization, the result would not be guaranteed to be of strength 2.

Methods

fast_forward(n)

Fast-forward the sequence by n positions.

integers(l_bounds, *[, u_bounds, n, ...])

Draw n integers from l_bounds (inclusive) to u_bounds (exclusive), or if endpoint=True, l_bounds (inclusive) to u_bounds (inclusive).

random([n, workers])

Draw n in the half-open interval [0, 1).

reset()

Reset the engine to base state.