scipy.stats.qmc.discrepancy#

scipy.stats.qmc.discrepancy(sample, *, iterative=False, method='CD', workers=1)[source]#

Discrepancy of a given sample.

Parameters

samplearray_like (n, d): The sample to compute the discrepancy from.
iterativebool, optional: Must be False if not using it for updating the discrepancy. Default is False. Refer to the notes for more details.
methodstr, optional: Type of discrepancy, can be CD, WD, MD or L2-star. Refer to the notes for more details. Default is CD.
workersint, optional: Number of workers to use for parallel processing. If -1 is given all CPU threads are used. Default is 1.

Returns

discrepancyfloat: Discrepancy.

Notes

The discrepancy is a uniformity criterion used to assess the space filling of a number of samples in a hypercube. A discrepancy quantifies the distance between the continuous uniform distribution on a hypercube and the discrete uniform distribution on \(n\) distinct sample points.

The lower the value is, the better the coverage of the parameter space is.

For a collection of subsets of the hypercube, the discrepancy is the difference between the fraction of sample points in one of those subsets and the volume of that subset. There are different definitions of discrepancy corresponding to different collections of subsets. Some versions take a root mean square difference over subsets instead of a maximum.

A measure of uniformity is reasonable if it satisfies the following criteria [1]:

It is invariant under permuting factors and/or runs.
It is invariant under rotation of the coordinates.
It can measure not only uniformity of the sample over the hypercube, but also the projection uniformity of the sample over non-empty subset of lower dimension hypercubes.
There is some reasonable geometric meaning.
It is easy to compute.
It satisfies the Koksma-Hlawka-like inequality.
It is consistent with other criteria in experimental design.

Four methods are available:

CD: Centered Discrepancy - subspace involves a corner of the hypercube
WD: Wrap-around Discrepancy - subspace can wrap around bounds
MD: Mixture Discrepancy - mix between CD/WD covering more criteria
L2-star: L2-star discrepancy - like CD BUT variant to rotation

See [2] for precise definitions of each method.

Lastly, using iterative=True, it is possible to compute the discrepancy as if we had \(n+1\) samples. This is useful if we want to add a point to a sampling and check the candidate which would give the lowest discrepancy. Then you could just update the discrepancy with each candidate using update_discrepancy. This method is faster than computing the discrepancy for a large number of candidates.

References

1: Fang et al. “Design and modeling for computer experiments”. Computer Science and Data Analysis Series, 2006.
2: Zhou Y.-D. et al. “Mixture discrepancy for quasi-random point sets.” Journal of Complexity, 29 (3-4) , pp. 283-301, 2013.
3: T. T. Warnock. “Computational investigations of low discrepancy point sets.” Applications of Number Theory to Numerical Analysis, Academic Press, pp. 319-343, 1972.

Examples

Calculate the quality of the sample using the discrepancy:

>>> from scipy.stats import qmc
>>> space = np.array([[1, 3], [2, 6], [3, 2], [4, 5], [5, 1], [6, 4]])
>>> l_bounds = [0.5, 0.5]
>>> u_bounds = [6.5, 6.5]
>>> space = qmc.scale(space, l_bounds, u_bounds, reverse=True)
>>> space
array([[0.08333333, 0.41666667],
       [0.25      , 0.91666667],
       [0.41666667, 0.25      ],
       [0.58333333, 0.75      ],
       [0.75      , 0.08333333],
       [0.91666667, 0.58333333]])
>>> qmc.discrepancy(space)
0.008142039609053464

We can also compute iteratively the CD discrepancy by using iterative=True.

>>> disc_init = qmc.discrepancy(space[:-1], iterative=True)
>>> disc_init
0.04769081147119336
>>> qmc.update_discrepancy(space[-1], space[:-1], disc_init)
0.008142039609053513