scipy.stats.qmc.LatinHypercube#

class scipy.stats.qmc.LatinHypercube(d, *, centered=False, strength=1, optimization=None, seed=None)[source]#

Latin hypercube sampling (LHS).

A Latin hypercube sample [1] generates \(n\) points in \([0,1)^{d}\). Each univariate marginal distribution is stratified, placing exactly one point in \([j/n, (j+1)/n)\) for \(j=0,1,...,n-1\). They are still applicable when \(n << d\).

Parameters
dint

Dimension of the parameter space.

centeredbool, optional

Center the point within the multi-dimensional grid. Default is False.

optimization{None, “random-cd”}, optional

Whether to use an optimization scheme to construct a LHS. Default is None.

  • random-cd: random permutations of coordinates to lower the centered discrepancy [5]. The best design based on the centered discrepancy is constantly updated. Centered discrepancy-based design shows better space filling robustness toward 2D and 3D subprojections compared to using other discrepancy measures [6].

New in version 1.8.0.

strength{1, 2}, optional

Strength of the LHS. strength=1 produces a plain LHS while strength=2 produces an orthogonal array based LHS of strength 2 [7], [8]. In that case, only n=p**2 points can be sampled, with p a prime number. It also constrains d <= p + 1. Default is 1.

New in version 1.8.0.

seed{None, int, numpy.random.Generator}, optional

If seed is None the numpy.random.Generator singleton is used. If seed is an int, a new Generator instance is used, seeded with seed. If seed is already a Generator instance then that instance is used.

Notes

When LHS is used for integrating a function \(f\) over \(n\), LHS is extremely effective on integrands that are nearly additive [2]. With a LHS of \(n\) points, the variance of the integral is always lower than plain MC on \(n-1\) points [3]. There is a central limit theorem for LHS on the mean and variance of the integral [4], but not necessarily for optimized LHS due to the randomization.

\(A\) is called an orthogonal array of strength \(t\) if in each n-row-by-t-column submatrix of \(A\): all \(p^t\) possible distinct rows occur the same number of times. The elements of \(A\) are in the set \(\{0, 1, ..., p-1\}\), also called symbols. The constraint that \(p\) must be a prime number is to allow modular arithmetic.

Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and strength 2 is a useful increment over strength 1. Going to strength 3 is a smaller increment and scrambled QMC like Sobol’, Halton are more performant [7].

To create a LHS of strength 2, the orthogonal array \(A\) is randomized by applying a random, bijective map of the set of symbols onto itself. For example, in column 0, all 0s might become 2; in column 1, all 0s might become 1, etc. Then, for each column \(i\) and symbol \(j\), we add a plain, one-dimensional LHS of size \(p\) to the subarray where \(A^i = j\). The resulting matrix is finally divided by \(p\).

References

1

Mckay et al., “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code.” Technometrics, 1979.

2

M. Stein, “Large sample properties of simulations using Latin hypercube sampling.” Technometrics 29, no. 2: 143-151, 1987.

3

A. B. Owen, “Monte Carlo variance of scrambled net quadrature.” SIAM Journal on Numerical Analysis 34, no. 5: 1884-1910, 1997

4

Loh, W.-L. “On Latin hypercube sampling.” The annals of statistics 24, no. 5: 2058-2080, 1996.

5

Fang et al. “Design and modeling for computer experiments”. Computer Science and Data Analysis Series, 2006.

6

Damblin et al., “Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties.” Journal of Simulation, 2013.

7(1,2)

A. B. Owen , “Orthogonal arrays for computer experiments, integration and visualization.” Statistica Sinica, 1992.

8

B. Tang, “Orthogonal Array-Based Latin Hypercubes.” Journal of the American Statistical Association, 1993.

Examples

Generate samples from a Latin hypercube generator.

>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> sample
array([[0.1545328 , 0.53664833],  # random
       [0.84052691, 0.06474907],
       [0.52177809, 0.93343721],
       [0.68033825, 0.36265316],
       [0.26544879, 0.61163943]])

Compute the quality of the sample using the discrepancy criterion.

>>> qmc.discrepancy(sample)
0.0196...  # random

Samples can be scaled to bounds.

>>> l_bounds = [0, 2]
>>> u_bounds = [10, 5]
>>> qmc.scale(sample, l_bounds, u_bounds)
array([[1.54532796, 3.609945  ],  # random
       [8.40526909, 2.1942472 ],
       [5.2177809 , 4.80031164],
       [6.80338249, 3.08795949],
       [2.65448791, 3.83491828]])

Use the optimization keyword argument to produce a LHS with lower discrepancy at higher computational cost.

>>> sampler = qmc.LatinHypercube(d=2, optimization="random-cd")
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0176...  # random

Use the strength keyword argument to produce an orthogonal array based LHS of strength 2. In this case, the number of sample points must be the square of a prime number.

>>> sampler = qmc.LatinHypercube(d=2, strength=2)
>>> sample = sampler.random(n=9)
>>> qmc.discrepancy(sample)
0.00526...  # random

Options could be combined to produce an optimized centered orthogonal array based LHS. After optimization, the result would not be guaranteed to be of strength 2.

Methods

fast_forward(n)

Fast-forward the sequence by n positions.

random([n])

Draw n in the half-open interval [0, 1).

reset()

Reset the engine to base state.