scipy.linalg.clarkson_woodruff_transform¶
- 
scipy.linalg.clarkson_woodruff_transform(input_matrix, sketch_size, seed=None)[source]¶ ” Applies a Clarkson-Woodruff Transform/sketch to the input matrix.
Given an input_matrix
Aof size(n, d), compute a matrixA'of size (sketch_size, d) so that\[\|Ax\| \approx \|A'x\|\]with high probability via the Clarkson-Woodruff Transform, otherwise known as the CountSketch matrix.
- Parameters
 - input_matrix: array_like
 Input matrix, of shape
(n, d).- sketch_size: int
 Number of rows for the sketch.
- seedNone or int or 
numpy.random.RandomStateinstance, optional This parameter defines the
RandomStateobject to use for drawing random variates. If None (ornp.random), the globalnp.randomstate is used. If integer, it is used to seed the localRandomStateinstance. Default is None.
- Returns
 - A’array_like
 Sketch of the input matrix
A, of size(sketch_size, d).
Notes
To make the statement
\[\|Ax\| \approx \|A'x\|\]precise, observe the following result which is adapted from the proof of Theorem 14 of [2] via Markov’s Inequality. If we have a sketch size
sketch_size=kwhich is at least\[k \geq \frac{2}{\epsilon^2\delta}\]Then for any fixed vector
x,\[\|Ax\| = (1\pm\epsilon)\|A'x\|\]with probability at least one minus delta.
This implementation takes advantage of sparsity: computing a sketch takes time proportional to
A.nnz. DataAwhich is inscipy.sparse.csc_matrixformat gives the quickest computation time for sparse input.>>> from scipy import linalg >>> from scipy import sparse >>> n_rows, n_columns, density, sketch_n_rows = 15000, 100, 0.01, 200 >>> A = sparse.rand(n_rows, n_columns, density=density, format='csc') >>> B = sparse.rand(n_rows, n_columns, density=density, format='csr') >>> C = sparse.rand(n_rows, n_columns, density=density, format='coo') >>> D = np.random.randn(n_rows, n_columns) >>> SA = linalg.clarkson_woodruff_transform(A, sketch_n_rows) # fastest >>> SB = linalg.clarkson_woodruff_transform(B, sketch_n_rows) # fast >>> SC = linalg.clarkson_woodruff_transform(C, sketch_n_rows) # slower >>> SD = linalg.clarkson_woodruff_transform(D, sketch_n_rows) # slowest
That said, this method does perform well on dense inputs, just slower on a relative scale.
References
- 1
 Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013.
- 2
 David P. Woodruff. Sketching as a tool for numerical linear algebra. In Foundations and Trends in Theoretical Computer Science, 2014.
Examples
Given a big dense matrix
A:>>> from scipy import linalg >>> n_rows, n_columns, sketch_n_rows = 15000, 100, 200 >>> A = np.random.randn(n_rows, n_columns) >>> sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows) >>> sketch.shape (200, 100) >>> norm_A = np.linalg.norm(A) >>> norm_sketch = np.linalg.norm(sketch)
Now with high probability, the true norm
norm_Ais close to the sketched normnorm_sketchin absolute value.Similarly, applying our sketch preserves the solution to a linear regression of \(\min \|Ax - b\|\).
>>> from scipy import linalg >>> n_rows, n_columns, sketch_n_rows = 15000, 100, 200 >>> A = np.random.randn(n_rows, n_columns) >>> b = np.random.randn(n_rows) >>> x = np.linalg.lstsq(A, b, rcond=None) >>> Ab = np.hstack((A, b.reshape(-1,1))) >>> SAb = linalg.clarkson_woodruff_transform(Ab, sketch_n_rows) >>> SA, Sb = SAb[:,:-1], SAb[:,-1] >>> x_sketched = np.linalg.lstsq(SA, Sb, rcond=None)
As with the matrix norm example,
np.linalg.norm(A @ x - b)is close tonp.linalg.norm(A @ x_sketched - b)with high probability.
