skerch

Sketched linear operations for PyTorch.

Consider a matrix or linear operator \(A \in \mathbb{C}^{M \times N}\), typically of intractable size and/or very costly measurements \(v \to Av\).

In many cases, such large operators feature a much smaller but hidden sub-structure (such as low-rank or banded), which allows for an approximation \(\hat{A}\) of scalable size. Typical examples of this are kernel matrices for large datasets, Hessian matrices for deep learning, large-scale datasets and the throughput of high-resolution simulations.

But obtaining \(\hat{A}\) through traditional compression methods is not feasible, since we would need to fully store or scan \(\hat{A}\) first. Instead, we directly obtain \(\hat{A}\) from just a few random \(y_i = A v_i\) measurements, or sketches (i.e. \(v_i\) follows some random distribution). Luckily, this is possible for a variety of \(\hat{A}\) structures, and the \(A v_i\) measurements are typically parallelizable, allowing us to work at large scales.

From an operational point of view, sketched methods only require the ability to draw a few matrix-vector measurements in the form \(Av, vA\). In Python, and for finite dimensions, this means providing an A.shape attribute and implementing the matrix-multiplication @ operation.

One core advantage of skerch is that this is the only requirement that \(A\) needs to fulfill (unlike other libraries which require A to implement more attributes and/or operations). In code, we just need to ensure that A satisfies the following interface:

class MyLinOp:
 def __init__(self, shape):
     self.shape = shape

 def __matmul__(self, x):
     return "... implement A @ x ..."

 def __rmatmul__(self, x):
     return "... implement x @ A ..."

Any operator implementing this interface will run on skerch routines such as diagonalizations, operator norms and triangular approximations. Other advantages of skerch:

  • Built on top of PyTorch, naturally supports CPU and CUDA, as well as complex datatypes. Very few dependencies otherwise

  • Rich API for matrix-free linear operators, including matrix-free noise sources (Rademacher, Gaussian, SSRFT…)

  • Efficient parallelized and distributed computations

  • Support for out-of-core operations via HDF5

  • A-posteriori verification tools to test accuracy of sketched approximations modular and extendible design, for easy adaption to new settings and operations

  • Modular and extendible design

See the API docs and examples for illustrations of the above points.

See also

  • [HMT2009]: Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp. 2011. “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions”. SIAM Review, 53 (2): 217-288.

  • [BWZ2016]: Christos Boutsidis, David P. Woodruff, and Peilin Zhong. 2016. “Optimal Principal Component Analysis in Distributed and Streaming Models”. ACM Symposium on Theory of Computing.

  • [TYUC2018]: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher. 2018. “Practical Sketching Algorithms for Low-Rank Matrix Approximation”. SIAM Journal on Matrix Analysis and Applications 38 (4).

  • [TYUC2019]: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher. 2019. “Streaming Low-rank Matrix Approximation with an Application to Scientific Simulation”. SIAM Journal on Scientific Computing 41 (4): A2430–63.

  • [BN2022]: Robert A. Baston, Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.

  • [ETW2024]: Ethan N. Epperly, Joel A. Tropp, Robert J. Webber. 2024. “XTrace: Making the most of every sample in stochastic trace estimation”. SIAM Journal on Matrix Analysis and Applications.

  • [FSMH2025] Andres Fernandez, Frank Schneider, Maren Mahsereci, Philipp Hennig. 2025. “Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods”. Transactions on Machine Learning Research.

  • [DEOFTK2025] Felix Dangel, Runa Eschenhagen, Weronika Ormaniec, Andres Fernandez, Lukas Tatzel, Agustinus Kristiadi. 2025. “Position: Curvature Matrices Should Be Democratized via Linear Operators”. arXiv 2501.19183.

Examples