scipy.stats.qmc.discrepancy¶
-
scipy.stats.qmc.
discrepancy
(sample, *, iterative=False, method='CD', workers=1)[source]¶ Discrepancy of a given sample.
- Parameters
- samplearray_like (n, d)
The sample to compute the discrepancy from.
- iterativebool, optional
Must be False if not using it for updating the discrepancy. Default is False. Refer to the notes for more details.
- methodstr, optional
Type of discrepancy, can be
CD
,WD
,MD
orL2-star
. Refer to the notes for more details. Default isCD
.- workersint, optional
Number of workers to use for parallel processing. If -1 is given all CPU threads are used. Default is 1.
- Returns
- discrepancyfloat
Discrepancy.
Notes
The discrepancy is a uniformity criterion used to assess the space filling of a number of samples in a hypercube. A discrepancy quantifies the distance between the continuous uniform distribution on a hypercube and the discrete uniform distribution on \(n\) distinct sample points.
The lower the value is, the better the coverage of the parameter space is.
For a collection of subsets of the hypercube, the discrepancy is the difference between the fraction of sample points in one of those subsets and the volume of that subset. There are different definitions of discrepancy corresponding to different collections of subsets. Some versions take a root mean square difference over subsets instead of a maximum.
A measure of uniformity is reasonable if it satisfies the following criteria [1]:
It is invariant under permuting factors and/or runs.
It is invariant under rotation of the coordinates.
It can measure not only uniformity of the sample over the hypercube, but also the projection uniformity of the sample over non-empty subset of lower dimension hypercubes.
There is some reasonable geometric meaning.
It is easy to compute.
It satisfies the Koksma-Hlawka-like inequality.
It is consistent with other criteria in experimental design.
Four methods are available:
CD
: Centered Discrepancy - subspace involves a corner of the hypercubeWD
: Wrap-around Discrepancy - subspace can wrap around boundsMD
: Mixture Discrepancy - mix between CD/WD covering more criteriaL2-star
: L2-star discrepancy - like CD BUT variant to rotation
See [2] for precise definitions of each method.
Lastly, using
iterative=True
, it is possible to compute the discrepancy as if we had \(n+1\) samples. This is useful if we want to add a point to a sampling and check the candidate which would give the lowest discrepancy. Then you could just update the discrepancy with each candidate usingupdate_discrepancy
. This method is faster than computing the discrepancy for a large number of candidates.References
- 1
Fang et al. “Design and modeling for computer experiments”. Computer Science and Data Analysis Series, 2006.
- 2
Zhou Y.-D. et al. Mixture discrepancy for quasi-random point sets. Journal of Complexity, 29 (3-4) , pp. 283-301, 2013.
- 3
T. T. Warnock. “Computational investigations of low discrepancy point sets”. Applications of Number Theory to Numerical Analysis, Academic Press, pp. 319-343, 1972.
Examples
Calculate the quality of the sample using the discrepancy:
>>> from scipy.stats import qmc >>> space = np.array([[1, 3], [2, 6], [3, 2], [4, 5], [5, 1], [6, 4]]) >>> l_bounds = [0.5, 0.5] >>> u_bounds = [6.5, 6.5] >>> space = qmc.scale(space, l_bounds, u_bounds, reverse=True) >>> space array([[0.08333333, 0.41666667], [0.25 , 0.91666667], [0.41666667, 0.25 ], [0.58333333, 0.75 ], [0.75 , 0.08333333], [0.91666667, 0.58333333]]) >>> qmc.discrepancy(space) 0.008142039609053464
We can also compute iteratively the
CD
discrepancy by usingiterative=True
.>>> disc_init = qmc.discrepancy(space[:-1], iterative=True) >>> disc_init 0.04769081147119336 >>> qmc.update_discrepancy(space[-1], space[:-1], disc_init) 0.008142039609053513