scipy.linalg.clarkson_woodruff_transform¶
-
scipy.linalg.clarkson_woodruff_transform(input_matrix, sketch_size, seed=None)[source]¶ ” Find low-rank matrix approximation via the Clarkson-Woodruff Transform.
Given an input_matrix
Aof size(n, d), compute a matrixA'of size (sketch_size, d) which holds:\[||Ax|| = (1 \pm \epsilon)||A'x||\]with high probability.
The error is related to the number of rows of the sketch and it is bounded
\[poly(r(\epsilon^{-1}))\]Parameters: - input_matrix: array_like
Input matrix, of shape
(n, d).- sketch_size: int
Number of rows for the sketch.
- seed : None or int or
numpy.random.RandomStateinstance, optional This parameter defines the
RandomStateobject to use for drawing random variates. If None (ornp.random), the globalnp.randomstate is used. If integer, it is used to seed the localRandomStateinstance. Default is None.
Returns: - A’ : array_like
Sketch of the input matrix
A, of size(sketch_size, d).
Notes
This is an implementation of the Clarkson-Woodruff Transform (CountSketch).
A'can be computed in principle inO(nnz(A))(withnnzmeaning the number of nonzero entries), however we don’t take advantage of sparse matrices in this implementation.References
[1] Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013. Examples
Given a big dense matrix
A:>>> from scipy import linalg >>> n_rows, n_columns, sketch_n_rows = (2000, 100, 100) >>> threshold = 0.1 >>> tmp = np.random.normal(0, 0.1, n_rows*n_columns) >>> A = np.reshape(tmp, (n_rows, n_columns)) >>> sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows) >>> sketch.shape (100, 100) >>> normA = linalg.norm(A) >>> norm_sketch = linalg.norm(sketch)
Now with high probability, the condition
abs(normA-normSketch) < thresholdholds.
