scipy.linalg.clarkson_woodruff_transform#
- scipy.linalg.clarkson_woodruff_transform(input_matrix, sketch_size, seed=None)[source]#
- Applies a Clarkson-Woodruff Transform/sketch to the input matrix. - Given an input_matrix - Aof size- (n, d), compute a matrix- A'of size (sketch_size, d) so that\[\|Ax\| \approx \|A'x\|\]- with high probability via the Clarkson-Woodruff Transform, otherwise known as the CountSketch matrix. - Parameters
- input_matrixarray_like
- Input matrix, of shape - (n, d).
- sketch_sizeint
- Number of rows for the sketch. 
- seed{None, int, numpy.random.Generator,
- numpy.random.RandomState}, optional- If seed is None (or np.random), the - numpy.random.RandomStatesingleton is used. If seed is an int, a new- RandomStateinstance is used, seeded with seed. If seed is already a- Generatoror- RandomStateinstance then that instance is used.
 
- Returns
- A’array_like
- Sketch of the input matrix - A, of size- (sketch_size, d).
 
 - Notes - To make the statement \[\|Ax\| \approx \|A'x\|\]- precise, observe the following result which is adapted from the proof of Theorem 14 of [2] via Markov’s Inequality. If we have a sketch size - sketch_size=kwhich is at least\[k \geq \frac{2}{\epsilon^2\delta}\]- Then for any fixed vector - x,\[\|Ax\| = (1\pm\epsilon)\|A'x\|\]- with probability at least one minus delta. - This implementation takes advantage of sparsity: computing a sketch takes time proportional to - A.nnz. Data- Awhich is in- scipy.sparse.csc_matrixformat gives the quickest computation time for sparse input.- >>> from scipy import linalg >>> from scipy import sparse >>> rng = np.random.default_rng() >>> n_rows, n_columns, density, sketch_n_rows = 15000, 100, 0.01, 200 >>> A = sparse.rand(n_rows, n_columns, density=density, format='csc') >>> B = sparse.rand(n_rows, n_columns, density=density, format='csr') >>> C = sparse.rand(n_rows, n_columns, density=density, format='coo') >>> D = rng.standard_normal((n_rows, n_columns)) >>> SA = linalg.clarkson_woodruff_transform(A, sketch_n_rows) # fastest >>> SB = linalg.clarkson_woodruff_transform(B, sketch_n_rows) # fast >>> SC = linalg.clarkson_woodruff_transform(C, sketch_n_rows) # slower >>> SD = linalg.clarkson_woodruff_transform(D, sketch_n_rows) # slowest - That said, this method does perform well on dense inputs, just slower on a relative scale. - References - 1
- Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013. 
- 2
- David P. Woodruff. Sketching as a tool for numerical linear algebra. In Foundations and Trends in Theoretical Computer Science, 2014. 
 - Examples - Given a big dense matrix - A:- >>> from scipy import linalg >>> n_rows, n_columns, sketch_n_rows = 15000, 100, 200 >>> rng = np.random.default_rng() >>> A = rng.standard_normal((n_rows, n_columns)) >>> sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows) >>> sketch.shape (200, 100) >>> norm_A = np.linalg.norm(A) >>> norm_sketch = np.linalg.norm(sketch) - Now with high probability, the true norm - norm_Ais close to the sketched norm- norm_sketchin absolute value.- Similarly, applying our sketch preserves the solution to a linear regression of \(\min \|Ax - b\|\). - >>> from scipy import linalg >>> n_rows, n_columns, sketch_n_rows = 15000, 100, 200 >>> rng = np.random.default_rng() >>> A = rng.standard_normal((n_rows, n_columns)) >>> b = rng.standard_normal(n_rows) >>> x = np.linalg.lstsq(A, b, rcond=None) >>> Ab = np.hstack((A, b.reshape(-1,1))) >>> SAb = linalg.clarkson_woodruff_transform(Ab, sketch_n_rows) >>> SA, Sb = SAb[:,:-1], SAb[:,-1] >>> x_sketched = np.linalg.lstsq(SA, Sb, rcond=None) - As with the matrix norm example, - np.linalg.norm(A @ x - b)is close to- np.linalg.norm(A @ x_sketched - b)with high probability.