skerch package

skerch: Sketched linear operations for PyTorch.

skerch Subpackages

skerch.cli package

skerch Submodules

skerch.a_posteriori module

A-posteriori utilities for sketched decompositions.

This module implements scalable utilities to assess the quality of a sketched approximation, once it has been obtained.

skerch.a_posteriori.apost_error(lop1, lop2, device, dtype, num_meas=5, seed=479915, noise_type='gaussian', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, adj_meas=None)[source]

A-posteriori sketched error estimation.

This function implements the error estimation procedure discussed in [TYUC2019, 6]. Given two linear operators \(A, \hat{A}\), it performs num_measurements random vector multiplications \(y_i = Av_i, \hat{y}_i = \hat{A}v_i\), and the \(y\) outputs can then be used to estimate the norms \(\lVert A \rVert_F^2, \lVert \hat{A} \rVert_F^2\), as well as the error \(\lVert A - \hat{A} \rVert_F\).

Note

The test measurements performed here are assumed to be random and independent of how either lop was obtained. For this reason, it is important to pick a combination of noise_type and seed that does not overlap with any other noise generation procedure used before.

Parameters:

lop1 – Linear operator with a shape=(h, w) attribute and implementing a left (x @ mat) or right (mat @ x) matmul op. It is assumed to match the given device and dtype.
lop2 – Linear operator to compare lop1 against. It must match in shape, device and dtype.
num_meas – Number of test measurements that will be performed on lop1 and lop2.
adjoint – If true, left-matmul is used, otherwise right-matmul.

Returns:

A tuple ((f1, f2, e), (F1, F2, E)), where f1, f2 are the estimated Frobenius norms squared of lop1, lop2 and e is the estimated Frobenius error squared. The uppercase counterparts are lists containing all num_meas individual measurements (which are averaged to obtain the estimates). To get confidence bounds on this estimate, use apost_error_bounds().

skerch.a_posteriori.apost_error_bounds(num_measurements, rel_err, is_complex=False)[source]

Probabilistic bounds for a-posteriori sketch error estimation.

The Frobenius error between any two linear operators can be estimated from sketches using apost_error(). But this estimation is random and subject to error.

This function retrieves the probabilistic error bounds presented in [TYUC2019, 6.4], which tell us the probability that the estimated error is wrong by a given amount. Interestingly, these bounds only depend on the number of measurements and data type, and not on the operator dimension.

Note

This function does not perform the error estimation itself, it merely informs about the probabilistic bounds associated with a given error. For the error estimation, see apost_error().

Parameters:

num_measurements (int) – How many measurements (assumed Gaussian iid) will be performed for the a-posteriori error estimation. More measurements means tighter error bounds (i.e. the output of our error estimation is more reliable).
rel_err (float) – A float between 0 and 1, indicating the relative error that we want to consider. If x is the actual error we want to estimate, and y is our estimation, this function will return the probability that y is in the range [(1-rel_err)x, (1+rel_err)x].
is_complex – The returned probabilities will be tighter if the error is measured on complex linear operators, using complex Gaussian noise. This boolean flag allows to specify this (assumed real-valued if false).

Returns:

Probabilities that, given the indicated number of measurements, the a-posteriori method yields values outside of actual_error * (1 +/- rel_err). Ideally, we want a sufficient number of measurements, such that the returned probabilities for small rel_err are themselves small (this means that the corresponding error estimation is tight).

skerch.a_posteriori.scree_bounds(S, err_frob)[source]

A-posteriori scree bounds for low-rank approximations.

Reference: [TYUC2019, 6.5].

When we do a sketched eigen- or singular-decomposition of a linear operator, we often don’t know what is the effective rank of said operator, and whether we took enough sketched measurements to cover it. A scree analysis is a tool to estimate the smallest rank that captures a sufficient amount of the actual matrix, typically by looking at inflection points in the curve associated with sharp decays in spectral norm.

A common issue with sketched methods is that we don’t have access to the actual spectrum, and thus we can’t perform a scree analysis. Instead, it is possible to use estimates from apost_error() to retrieve upper and lower scree bounds, which ideally would still inform us about the effective rank of the original operator.

Parameters:

S – Vector of the estimated eigen/singular values, expected to contain entries in non-ascending magnitude.
err_frob – A-posteriori estimate of frob_norm(M - M'), where M' is the recovery of M with singular values S, as returned by apost_error() (note that here the norm is not squared).

Usage example:

U, S, Vh = ssvd(A)
Ahat = CompositeLinOp((("US", U * S), ("Vh", Vh)))
(f1, _, err), _ = apost_error(A, Ahat, "...")
scree_lo, scree_hi = scree_bounds(S, err**0.5)

skerch.algorithms module

In-core sketched algorithms.

class skerch.algorithms.SketchedAlgorithmDispatcher[source]

Bases: object

Provides sketched algos with measurement linops and recovery algorithms.

Sketched methods are generally flexible in the kind of noise or recovery method they use. In order to accommodate for this flexibility, skerch algorithms call this dispatcher with their requested configuration (e.g. "gaussian" for noise and "nystrom" for recovery).

This modular and extendible framework makes it easy to change between existing options, and also to add new options without having to modify the sketched algorithms. For example, if users want to run ssvd() with a new type of noise linop called MyNoise, they just need to extend mop() into a new dispatcher class, and provide the dispatcher to SSVD.

Detailed examples on how to do this can be found in the documentation.

static mop(mop_type, hw, seed, dtype, blocksize=1, register=False)[source]

Returns measurement linop with given specs.

The returned linop must follow the same interface as skerch.linops.ByBlockLinOp, i.e. must support:

Left and right matmul to matrices via @
A get_blocks(dtype, device) method that returns pairs in the form (block, idxs), where block is a subset of vectors, assumed to be columns of the returned linop, and corresponding to the given idxs.

Parameters:

mop_type – A string defining the measurement linop type (e.g. "rademacher", "gaussian", "ssrft", "phase").
hw – Tuple (height, width) with the desired linop shape.
seed – Random seed for the measurement linop.
blocksize – How many measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.
register – Whether to register this linop in a global tracker that will raise an exception if other linops with overlapping seeds have been already created. This is useful to debug new methods if they have e.g. issues due to the use of correlated noise. See skerch.measurements.RademacherNoiseLinOp for an example.

static recovery(recovery_type, hermitian=False)[source]

Returns recovery funtion with given specs.

Parameters:: recovery_type – String specifying which recovery type to be used (e.g. "singlepass", "nystrom", "oversampled_500").
Returns:: A tuple (fn, inner_dims), where fn is a recovery function. If inner_dims is None, fn should have an interface like skerch.recovery.singlepass(). If inner_dims is a positive integer, fn should behave like skerch.recovery.oversampled().

static unitnorm_lop_entries(lop_type)[source]

True if all lop_type entries are supposed to have unit norm.

E.g. Rademacher or Phase noise entries return True, and Gaussian returns False.

class skerch.algorithms.TriangularLinOp(lop, stair_width=None, num_gh_meas=0, lower=True, with_main_diagonal=False, use_fft=False, seed=479915, noise_type='rademacher', max_mp_workers=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, meas_blocksize=None)[source]

Bases: BaseLinOp

Given a square linop, compute products with its lower/upper triangle.

The triangle of a linear operator can be approximated from the full operator via a “staircase pattern” of exact measurements. For example, given an operator of shape (1000, 1000), and stairs of size 100, we could obtain 9 exact measurements strictly under the diagonal, the first one covering lop[100:, :100], the next one lop[200:, 100:200], and so on. And we would leave out only 9 small triangles near the diagonal in a sort of “serrated” pattern. The smaller the stair size, the more measurements we do and the more closely the full triangle is approximated (a stair size of 1 results in an exact method, doing full measurements).

The interesting thing is that the “serrated” part can be estimated with a variation of the Girard-Hutchinson estimator:

\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(\bar{v}) \odot Av \big]\]

Where \(\varphi\) follows a skerch.utils.serrated_hadamard_pattern(). By linearity, it can be shown that this expectation samples entries of \(A\) from the block-triangles near the diagonal.

Finally, we add the exact step-wise measurements and the serrated Girard-Hutchinson estimation (see e.g. hutchpp()), resulting in a triangular estimation.

Usage example:

lop = MyLinOp("...")
tril = TriangularLinOp(lop, lower=True, stair_width=10, num_gh_meas=1000)
w = tril @ v

Parameters:

lop – A square linear operator. It must implement a lop.shape = (dims, dims) attribute as well as the left- and right- matmul operator @, interacting with torch tensors.
stair_width – Width of each step in the staircase pattern. If it is 1, a total of dims exact measurements will be performed and the triangular products will be exact. If it equals dims, no exact measurements will be performed (since the staircase pattern would cover the full triangle). The step size regulates this trade-off: Ideally, we want as many exact measurements as possible, but not too many. If no value is provided, dims // 2 is chosen by default, such that only 1 exact measurement is performed.
num_hutch_measurements – Number of measurements for the serrated estimation that complements the staircase estimation. This estimator generally requires many measurements to be informative, and it can even be counter- productive if not enough measurements are given. If lop is not diagonally dominant, consider setting this to 0 for a sufficiently good approximation via deterministic staircase_measurements. Otherwise, make sure to provide a sufficiently high number of measurements.
lower – If true, lower triangular matmuls will be computed. Otherwise, upper triangular.
with_main_diagonal – If true, the main diagonal will be included in the triangle, otherwise excluded. If you already have precomuted the diagonal elsewhere, consider excluding it from this approximation, and adding it separately.
use_fft – Whether to use FFT for the Hutchinson estimation. See skerch.utils.subdiag_hadamard_pattern() for more details.
seed – Seed for the random measurements used in the serrated estimator.
noise_type – String indicating noise type to be used in the serrated estimator. Must be supported by the given dispatcher.
meas_blocksize – How many serrated measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.

LOP_REPR_CHARS = 30

skerch.algorithms.hutch(lop, lop_device, lop_dtype, num_meas, seed=479915, noise_type='rademacher', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, return_diag=True, defl_Q=None)[source]

Girard-Hutchinson trace and diagonal estimator.

Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^H] = I\), consider the Girard-Hutchinson diagonal estimator:

\[\mathbb{E}_{v \sim \mathcal{R}} \big[ \bar{v} \odot Av \big]_i = \sum_j A_{ij} \mathbb{E}\big[ \bar{v}_i v_j \big] = A_{ii}\]

The convergence of this estimator can be accelerated greatly if we also incorporate rank deflation \(A = (Q Q^H) A + (I - (Q Q^H)) A\), where \(Q\) is an orthogonal matrix spanning the top space of \(A\). Similarly to ssvd(), it turns out that we can obtain \(Q\) efficiently from defl_dims random measurements. Then, the diagonal of \((Q Q^H) A\) can be obtained exactly, and \(diag((I - (Q Q^H)) A)\) is then estimated via Girard-Hutchinson using extra_gh_meas (uncorrelated) measurements. This is the Hutch++ algorithm (see [BN2022]).

A very similar logic applies for the estimation of the trace, which is the sum of the diagonal entries. In fact, most computations can be recycled, and this algorithm computes and returns both quantities jointly with minimal overhead. The Hutchinson estimator for the trace is:

\[tr(A) = \langle A, I \rangle = \langle A, \mathbb{E}_{v \sim \mathcal{R}}[v v^H] \rangle = \mathbb{E}_{v \sim \mathcal{R}}[v^H A v]\]

Note

As it can be seen in Table 1 from [BN2022], Gaussian noise is generally less efficient than Rademacher for the Girard-Hutchinson step. In general, we observe that noise entries that are more or less of the same magnitud help with stabiltiy. We also see that the number of required G-H samples for good recovery is in the millions, i.e. this is not efficient for smaller matrices.

Parameters:

lop – The \(A\) operator whose diagonal we wish to estimate.
lop_device – The device where \(A\) runs.
lop_dtype – The datatype \(A\) interacts with.
num_meas – How many measurements will be used.
seed – Overall random seed for the algorithm.
noise_type – Which noise to use. Must be supported by the given dispatcher (see SketchedAlgorithmDispatcher.mop()).
meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.
return_diag – If true, diagonal estimation is also returned. Otherwise just the trace.
defl_Q – The tall \(Q\) matrix used for optional deflation (see explanation). It should be composed of orthonormal columns and have shape (dims, defl_dims). Ideally it is aligned with the top-space of lop. Also, it should be obtained from measurements that are uncorrelated to this estimator, so make sure to use a different source of random noise (e.g. a different seed) here.

Returns:

A dictionary in the form {"tr": tr, "diag": diag} with the trace and diagonal (if return_diag is true) estimations.

skerch.algorithms.seigh(lop, lop_device, lop_dtype, outer_dims, seed=479915, noise_type='rademacher', recovery_type='hmt', lstsq_rcond=1e-06, meas_blocksize=None, by_mag=True, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>)[source]

In-core Sketched Hermitian eigendecomposition (EIGH).

This function is the Hermitian version of ssvd(). It behaves largely the same, with the following differences:

It assumes that the provided lop is Hermitian
Less measurements are performed due to Hermitian symmetry
Unlike the SVD, the returned value is a pair \((\Lambda, Q)\), where \(A \approx Q diag(\Lambda) Q^H\). Since \(A\) is Hermitian, this is an eigendecomposition: \(\Lambda\) may contain negative values, and both left and right matrices are identical.

Refer to the ssvd() docs for more details.

skerch.algorithms.snorm(lop, lop_device, lop_dtype, num_meas=5, seed=479915, noise_type='gaussian', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, adj_meas=None, norm_types=('fro', 'op'))[source]

Sketched norm estimation.

It is possible to estimate the norm of a linear operator from a few random measurements. In [TYUC2019, 6.2], it is shown that the Frobenius norm is proportional to the norm of the measurements themselves (akin to a Gram trace estimation). For \(q\) random measurements \(\Omega\) and beta = 2 if complex_dtype else 1 we have:

\[\lVert A \rVert_F^2 = \frac{1}{\beta q} \mathbb{E} \big[ A \Omega \big]\]

From the same measurements, we can also estimate the operator norm (largest singular value) as follows. Consider the linear operator \(A\), and its top-k approximation \(\hat{A} = Q Q^H A\) for orthogonal \(Q\). Then:

\[\lVert A \rVert_2 = \lVert \hat{A} \rVert_2 \approx \lVert Q Q^H A \rVert_2 = \lVert Q^H A \rVert_2\]

And, as discussed in ssvd(), \(Q\) can be efficiently estimated from a few random measurements \(A \Omega\).

Parameters:

lop – The \(A\) operator whose norms we wish to estimate.
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
num_meas – How many measurements will be used to estimate the norms (\(q\) in the above description)
seed – Overall random seed for the algorithm
noise_type – Which noise to use for \(\Omega\). Must be supported by the given dispatcher (see SketchedAlgorithmDispatcher.mop())
meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.
adj_meas – If true, measurements to obtain \(Q\) are adjoint. This can affect accuracy somewhat.
norm_types – Collection with norm types to be returned. Currently supported: "fro" (Frobenius), "op" (operator norm)

Returns:

The tuple (result, (Q, R)), where (Q, R) is the QR decomposition of the sketched measurements, and result is a dictionary in the form {norm_type: value}.

skerch.algorithms.ssvd(lop, lop_device, lop_dtype, outer_dims, seed=479915, noise_type='rademacher', recovery_type='hmt', lstsq_rcond=1e-06, meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>)[source]

In-core Sketched SVD.

The core idea behind sketched SVDs, introduced in [HMT2009] and outlined in skerch.recovery.hmt(), relies on the existence of orthogonal thin matrices \(P, Q\), such that the following approximation holds:

\[A \approx P P^H A Q Q^H = P C Q^H\]

The crucial point here is that it is possible to efficiently obtain \(P, Q\) from random measurements, or sketches. Then, it only remains to decompose the small “core” matrix \(C\), which can be done with traditional methods.

Depending on which recovery_type we use, there are several strategies to solve this sketch, each with their nuances and tradeoffs (see e.g. note below). Available recovery methods are implemented and documented in skerch.recovery.

Note

The straightforward recovery method from [HMT2009] (implemented in skerch.recovery.hmt()) requires us to first obtain \(Q\), and then do a second round of measurements \(A Q\) to obtain the core matrix. Further developments have shown that it is possible to estimate \(A Q\) without having to run a second pass of measurements by solving a least-squares problem, which can lead to significant speedup if we leverage parallelization (see e.g. [TYUC2018]). These single-pass recovery methods have also been implemented in skerch.recovery. Check the corresponding docs and the recovery_type parameter for more info.

Parameters:

lop – The linear operator \(A\) to be decomposed
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
outer_dims – How many measurements will be used to obtain \(P\) and \(Q\) (respectively)
seed – Overall random seed for the algorithm
noise_type – Which noise to use. Must be supported by the given dispatcher (see SketchedAlgorithmDispatcher.mop())
recovery_type – Which recovery method to use. Must be supported by the given dispatcher (see SketchedAlgorithmDispatcher.recovery() and skerch.recovery).
lstsq_rcond – Least-squares condition threshold used in the recovery, see SketchedAlgorithmDispatcher.recovery() and skerch.recovery.
meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.

Returns:

The singular value decomposition U, S, Vh where \(A \approx U diag(S) V^H\).

skerch.algorithms.xhutchpp(lop, lop_device, lop_dtype, x_dims=0, gh_meas=0, seed=479915, noise_type='rademacher', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, return_diag=True, cache_xmop=True)[source]

Diagonal and trace sketched approximation via Hutch/XDiag.

In hutch() we see how to estimate the trace and diagonal via Girard-Hutchinson/Hutch++. This function extends this functionality with XTrace/XDiag [ETW2024], which allow us to perform x_dims dimensional deflation, and then recycle the x_dims measurements for the subsequent deflated Girard-Hutchinson estimator, which we can also further enrich with gh_meas extra measurements. The memory cost of this function is dominated by the deflation: we need to store a matrix of (dims, x_dims) size. The runtime cost is typically dominated by the total number of measurements: 2 * x_dims + gh_meas.

Note

This function is equivalent to plain Girard-Hutchinson for x_dims=0, and equivalent to XDiag if gh_meas=0. For the Hutch++ estimator, run hutch() providing the desired Q_defl deflation matrix.

See also

This blogpost provides derivations and elaborates on the relationship between Hutch++ and XDiag, and how this can lead to a common implementation such as this one.

Parameters:

defl_dims – How many measurements will be used to obtain the \(Q\) deflation matrix
lop – The \(A\) operator whose diagonal we wish to estimate.
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
x_dims – How many measurements will be used to obtain the \(Q\) deflation matrix, and the subsequent deflated estimation (the k above).
seed – Overall random seed for the algorithm
noise_type – Which noise to use. Must be supported by the given dispatcher (see SketchedAlgorithmDispatcher.mop())
meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See skerch.linops.ByBlockLinOp for more details.
cache_mop – If true, the measurement linear operator (which is used twice) is converted to an explicit matrix and kept around. This saves computation, since it does not need to be sampled twice, at the expense of the memory required to store its entries.

Returns:

A dictionary in the form {"tr": t, "diag": d, "Q": Q, "R": R, "Sh_k": S, "Psi": P} containing trace and diagonal (if return_diag is true) estimations, as well as the \(Q, R\) matrices corresponding to the QR decomposition of the deflation measurements, and the \(S_k^H, \Psi\) objects needed to compute the exchangeable estimator.

skerch.hdf5 module

Persistent and out-of-core tensor functionality via HDF5.

class skerch.hdf5.DistributedHDF5Tensor[source]

Bases: object

Class to create and manage distributed HDF5 tensors.

In general, multiple processes are not allowed to concurrently open and write to HDF5 files. In order to allow for distributed, but coherent writing, this class allows to create many individual HDF5 files, and then “virtually” merge them into a single coherent dataset. This allows us to distribute a (potentially large) tensor across processes and machines, and write to it concurrently.

Once we are done writing, we may want to access the result. Unfortunately, most OS don’t allow a single process to open many files at once. As a result, any files above the limit would be silently ignored. This class also solves this by providing a merge() method, to gather the distributed chunks back into a single, monolithic file.

See docs for more examples, also on how to create and merge HDF5 datasets directly from command line.

DATA_NAME = 'data'

FLAG_DTYPE = dtype('O')

FLAG_NAME = 'flags'

INITIAL_FLAG = 'initialized'

MAIN_PATH = 'ALL'

classmethod create(basepath_fmt, shape, partition_size, dtype, compression='lzf')[source]

Create a distributed HDF5 measurement dataset.

Parameters:

basepath_fmt – Format string with the directory and dataset name to store created HDF5 files, in the form <DIR>/my_dataset_{}.h5. Directory must be empty.
shape – Shape of the global tensor corresponding to the whole HDF5 dataset.
partition_size – The HDF5 dataset will be partitioned across its first axis on sub-files. E.g. a shape of (10, 20) with a partition size of 6 will result in 2 files of shapes (6, 20) and (4, 20).
dtype – Datatype of the HDF5 arrays to be created.

Returns:

The tuple (all_path, subpaths, begs_ends), where all_path is the path of the virtual dataset encompassing the global tensor, subpaths are the paths to the respective chunk HDF5 files, and begs_ends are the begining (included) and end (excluded) indices that were used to partition the global tensor on subfiles, across its first axis.

static get_idxs_format(max_idx)[source]: Helper method, retrieves format strings to index HDF5 chunks.

static iter_partition_idxs(max_idx, partition_size)[source]

Iterate from 0 to max_idx by partition_size.

Returns:: Pairs of (beg, end) where beg is included and begins with 0, and end is exluded and equals at most max_idx.

classmethod load(path, filemode='r+')[source]

Load the given HDF5 dataset.

Load an individual dataset, such as the virtual ones created via create() or the monolithic ones merged via merge().

Parameters:

path – Path to the HDF5 file.
filemode – Default is ‘r+’, read/write, file must preexist. See documentation of h5py.File for more details.

Returns:

(data, flag, h5f), where data is the dataset for the numerical measurements, flag is the dataset for state tracking, and h5f is the (open) HDF5 file handle.

Note

Remember to h5f.close() once done with this file.

classmethod merge(all_path, out_path=None, compression='lzf', check_success_flag=None, delete_subfiles_while_merging=False)[source]

Merges distributed HDF5 dataset into a single, monolithic HDF5 file.

Parameters:

all_path – The all_path of a virtual HDF5 dataset like the ones created via create(). It must be a “virtual” dataset, i.e. composed of chunks that are distributed across other files.
out_path – If None, merged dataset will be written over the given all_path, i.e. it will be converted from virtual into monolithic in-place. Otherwise, path for a new HDF5 monolithic file where the contents will be written into.
check_success_flag – If given, this method will check that all HDF5 flags equal this value, raise an error otherwise.
delete_subfiles_while_merging (bool) – If true, each distributed HDF5 file that is visited will be deleted right after it is merged onto the monolithic HDF5 file. Useful to avoid large memory overhead.

Returns:

Path of the merged HDF5 file.

skerch.hdf5.create_hdf5_layout_lop(root, lop_shape, dtype, partition_size, lo_meas=None, ro_meas=None, inner_meas=None, lo_fmt='leftouter_{}.h5', ro_fmt='rightouter_{}.h5', inner_fmt='inner_{}.h5')[source]

Creation of persistent HDF5 files to store linop sketches.

This convenience method prepaers the HDF5 placeholders that can be used to store sketches from a linop of shape lop_shape. It supports independent creation of left-, right- and inner measurements, thus supporting most use cases involving linear sketches.

Parameters:

root – Where to store the created HDF5 files. Must be an empty directory.
lop_shape – Shape of linear operator to sketch from, in the form (height, width).
dtype – Torch dtype of the operator, e.g. torch.float32. The HDF5 arrays will be of same type.
partition_size – Each created HDF5 will be split into chunks of this many vectors (see DistributedHDF5Tensor.create() for more details).
lo_meas – If given, a dataset of shape (lo_meas, w) will be created under the name given by lo_fmt.
ro_meas – If given, a dataset of shape (h, ro_meas) will be created under the name given by ro_fmt.
inner_meas – If given, a dataset of shape (inner_meas, inner_meas) will be created under the name given by inner_fmt.

skerch.linops module

Basic utilities for (matrix-free) linear operators.

class skerch.linops.BandedLinOp(indexed_diags, symmetric=False)[source]

Bases: BaseLinOp

Banded matrix-free linear operator.

Given a collection of \(k\) vectors, this class implements a banded linear operator, where each given vector is a (sub)-diagonal. It is composed by \(k\) DiagonalLinOp operators, thus its memory and runtime is equivalent to storing and running the individual diagonals.

Note

The shape of this linear operator is implicitly given by the diagonal lengths and indices. Any inconsistent input will result in a BadShapeError. In particular, symmetric banded matrices must also be square. For example, a square tridiagonal matrix of shape (n, n) has a main diagonal at index 0 with length n, and two subdiagonals at indices 1, -1 with length n - 1. Still, this linop can also be non-square (unless it is symmetric), as long as all given diagonals fit in the implicit shape.

Usage example:

diags = {0: some_diag, 1: some_superdiag, -1, some_subdiag}
B = BandedLinOp(diags, symmetric=False)
w = B @ v

Parameters:

indexed_diags – Dictionary in the form {idx: diag, ...} where diag is a torch vector containing a diagonal, and idx indicates the location of the diagonal: 0 is the main diagonal, 1 the superdiagonal (lop[i, i+1]), -1 the subdiagonal, and so on.
symmetric – If true, only diagonals with nonnegative indices are admitted. Each positive index will be replicated as a negative one.

MAX_PRINT_ENTRIES = 20

to_matrix()[source]

Convert this linear operator into a matrix.

Datatype and device are borrowed from the first diagonal that was passed to the constructor.

class skerch.linops.BaseLinOp(shape, batch=None)[source]

Bases: object

Base class for linear operators.

Implements the .shape attribute and basic matmul functionality with vectors and matrices (also via the @ operator). Intended to be extended with further functionality via matmul() and rmatmul(). See documentation and codebase for examples.

Note

Inputs to matmul, rmatmul can be vectors or matrices, but implementations are responsibe to handle matrix-matrix parallelization.

Parameters:

shape – (height, width) of linear operator.
batch – When calling self @ x against a matrix with several vectors, the default batch=None will call matmul() with the whole x matrix at once. If a different batch is provided, x will be split in chunks of batch vectors, which will be fed to matmul() sequentially. This is useful to eg. prevent out-of-memory errors due to processing too large chunks at once.

matmul(x)[source]: Implement with the desired functionality. See class docstring.

rmatmul(x)[source]: Implement with the desired functionality. See class docstring.

t()[source]

Return (Hermitian) transposition of this lop.

Equivalent to TransposedLinOp(self), see TransposedLinOp.

class skerch.linops.ByBlockLinOp(shape, by_row=False, batch=None, blocksize=1)[source]

Bases: BaseLinOp

Matrix-free operator computed by blocks of submatrices.

Consider a large matrix that does not fit in memory. Still, we can perform matrix multiplications by sequentially loading smaller blocks in memory, and then aggregating the result. This is the main motivation for this by-block linear operator:

At instantiation, users determine the by_row and blocksize parameters, which determine how many rows/columns will be internally used at once.
At runtime, this class calls sequentially the get_block() method, performs the partial matrix-multiplications and aggregates them.
At development, extensions of this class only have to implement get_block() accordingly to return blocks of the right shape.

Note

The by_row flag has implications in terms of memory and runtime. If true, for a lop of shape (h, w) and block size 1, the lop @ x matrix-vector multiplication will call get_vector() h times, and perform h dot products of dimension w. If false, it will perform w scalar products of dimension h. In the case of x @ lop, the number of scalar and dot products are swapped.

Therefore, developers need to override get_block() taking this flag into account, and users should set it to the scenario that is most efficient (e.g. by-column is generally more efficient when h > w). See skerch.measurements for examples.

Parameters:

by_row – If true, blocks are groups of rows, otherwise columns.
blocksize – If integer, determines how many rows/columns each block has, and a row will be a matrix.
batch – If the input to matmul is a matrix itself, this determines how many vectors are computed at once. Note that this is different to blocksize, which refers to the blocks of this linop.

get_block(block_idx, input_dtype, input_device)[source]

Method to gather a block (matrix) from this linear operator.

Override this method with the desired behaviour. For a shape of (h, w), it should return matrices of shape (block, w) if self.by_row is true, and (h, block) otherwise.

Note

If blocksize==1, returning vectors may work in some cases, but it is recommended to still return matrices (where one of the dimensions equals 1).

Parameters:

block_idx – Index of the block to be returned. Use the auxiliary method get_vector_idxs() if you need to know which vector indices are associated to this block index. The attributes self.num_vecs, self.num_blocks may also be helpful.
input_dtype – The dtype of the input tensor that this linop was called on. The output of this method should generally be in the same device.
input_device – The device of the input tensor that this linop was called on. The output of this method should generally be in the same device.

get_blocks(dtype, device='cpu')[source]: Yields all blocks in ascending order with the corresponding idxs.

get_idx_coords(idx)[source]

Retrieve vector index in block coordinates.

Useful if we want to retrieve a given vector from this linop: since this linop is defined in a by-block fashion, we first need to know which block to retrieve, and then which vector index within the retrieved block.

Parameters:: idx – Integer between 0 and N - 1, where N is the number of rows, if by_row is true, or columns otherwise.
Returns:: Pair of ints (b, v), such that vec_idx is the v-th element of the b-th block, e.g. in a by-row linop, lop[idx] = lop.get_block(idxs, dtype, device)[idx].

get_vector_idxs(block_idx)[source]

Retrieves a range with vector indices corresponding to a block.

Parameters:: block_idx – Integer between 0 and self.num_blocks - 1.
Returns:: range(beg, end) with the vector indices corresponding to the block_idx block.

matmul(x)[source]: Performs right matrix-multiplication.

rmatmul(x)[source]: Performs left (adjoint) matrix-multiplication.

to_matrix(dtype, device='cpu')[source]: Converts this linop to a matrix of given dtype, device.

class skerch.linops.CompositeLinOp(named_operators)[source]

Bases: object

Matrix-free composite linear operator.

This class composes an ordered collection of operators [A, B, C, ...] into A @ B @ C ... in a matrix-free fashion.

Warning

Using this class could be more inefficient than directly computing the composed operator, e.g. if A.shape = (1, 1000) and B.shape = (1000, 1), then computing the scalar C = A @ B and then applying it is more efficient than keeping a CompositeLinearoperator with A, B (in terms of both memory and computation). This class does not check for such cases, users are encouraged to take this into account. Note that composite linops can also be nested.

Parameters:: named_operators – Ordered collection in the form [(n_1, o_1), ...] where each n_i is a string with the name of operator o_i. Each o_i operator must implement __matmul__ and __rmatmul__ as well as the shape = (h, w) attribute. This object will then become the composite operator o_1 @ o_2 @ ...

class skerch.linops.DiagonalLinOp(diag)[source]

Bases: BaseLinOp

Diagonal matrix-free linear operator.

Given a vector v of d dimensions, this class implements a diagonal linear operator of shape (d, d) via left- and right-matrix multiplication, as well as the shape attribute, only requiring linear (\(\mathcal{O}(d)\)) memory and runtime.

Parameters:: diag – Vector to be casted as diagonal linop.

MAX_PRINT_ENTRIES = 20

to_matrix()[source]: Efficiently convert this linear operator into a matrix.

class skerch.linops.SumLinOp(named_signed_operators)[source]

Bases: BaseLinOp

Matrix-free sum of linear operators.

Given a collection of same-shape linear operators A, B, C, D ..., this class implements the sum A + B + C - D ... in a matrix-free fashion.

Parameters:: named_signed_operators – Collection in the form {(n_1, s_i, o_1), ...} where each n_i is a string with the name of operator o_i, and s_i is a boolean: if true, this operator is to be added, otherwise subtracted. Each o_i operator must implement __matmul__ and __rmatmul__ as well as the shape = (h, w) attribute. This object will then become the operator o_1 + o_2 - .... All operators must also have same shape.

class skerch.linops.TorchLinOpWrapper[source]

Bases: object

Linear operator that always accepts and produces PyTorch tensors.

Since skerch is built on top of PyTorch, but some other useful LinOps interface with e.g. NumPy arrays instead, this mixin class acts as a wraper on the __matmul__ and __rmatmul__ operators, so that the operator expects and returns torch tensors, even when the wrapped operator interfaces with NumPy/HDF5.

This facilitates interoperability between skerch and other python linops. Usage example:

# extend NumPy linear operator via multiple inheritance
class TorchWrappedNumpyLinOp(TorchLinOpWrapper, NumpyLinOp):
    pass
lop = TorchWrappedNumpyLinOp(...)  # instantiate normally
w = lop @ v  # now v can be a PyTorch tensor on any device

class skerch.linops.TransposedLinOp(lop)[source]

Bases: object

Hermitian transposition of a linear operator.

Given a linear operator \(A\), real or complex, this class wraps its functionality, such that TransposedLinOp(lop) behaves line the (Hermitian) transposition \(A^H\). This is done by swapping dimensions and matmul methods leveraging the following identity:

\(A^H v = ((A^H v)^H)^H = (v^H A)^H\).

Usage example:

lopT = TransposedLinOp(lop)

Parameters:: lop – Any object supporting a shape (h, w) attribute as well as the right- and left- matmul operator @.

t()[source]: Undo transposition: returns original lop.

skerch.linops.check_linop_input(x, lop_shape, adjoint)[source]

Checks that input is a matrix or vector of the right shape.

Parameters:

x – The input to this linear operator. It should be either a vector or a matrix of matching shape to lop_shape.
adjoint (bool) – If true, x @ lop is assumed, otherwise lop @ x.

Raises:

BadShapeError if there is any shape mismatch.

skerch.linops.linop_to_matrix(lop, dtype=torch.float32, device='cpu', adjoint=False)[source]

Convert a linop to a matrix via one-hot matrix-vector products.

Parameters:: adjoint – If true, one-hot products are done via v_i @ lop. Otherwise lop @ v_i.
Returns:: Matrix equivalent to the given lop, on the given dtype and device.

skerch.measurements module

Functionality to perform sketched measurements.

class skerch.measurements.GaussianNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True, mean=0.0, std=1.0)[source]

Bases: RademacherNoiseLinOp

Random linear operator with i.i.d. Gaussian entries.

Like RademacherNoiseLinOp, but with Gaussian noise.

REGISTER = {'default': [(9876531, 9877531), (12345, 12395)]}

get_block(block_idx, input_dtype, input_device)[source]

Samples a vector with Gaussian i.i.d. noise.

See base class for details.

class skerch.measurements.PhaseNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True, conj=False)[source]

Bases: RademacherNoiseLinOp

Random linear operator with i.i.d. complex entries in the unit circle.

Like RademacherNoiseLinOp, but with phase noise. Must be of complex datatype.

Parameters:: conj – For the same seed, the linear operators with true and false conj values are complex conjugates of each other.

REGISTER = {'default': [(12345, 12395)]}

get_block(block_idx, input_dtype, input_device)[source]

Samples a vector with i.i.d. phase noise.

See base class definition for details.

class skerch.measurements.RademacherNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True)[source]

Bases: ByBlockLinOp

Random linear operator with i.i.d. Rademacher entries.

Warning

Since this linop uses random generators and seeds to fetch the blocks, it is important that two different instances do not overlap in seeds, to prevent correlated noise. Use sufficiently far away seeds and register=True to test this behaviour.

Parameters:

shape – Shape of the linop as (h, w).
seed – Random seed used in get_block() to sample random blocks.
by_row – See skerch.linops.ByBlockLinOp.
register – If true, when the linop is created, its seed range (going from seed to seed + max(h, w)) is added to a class-wide register, which raises a skerch.utils.BadSeedError if there are any other instances of this class with overlapping ranges. If false, this behaviour is disabled.

REGISTER = {'default': [(12345, 12395)]}

classmethod check_register()[source]: Checks if two different-seeded linops have overlapping seeds.

get_block(block_idx, input_dtype, input_device)[source]

Samples a vector with Rademacher i.i.d. noise.

See base class definition for details.

class skerch.measurements.SSRFT[source]

Bases: object

Scrambled Subsampled Randomized Fourier Transform (SSRFT).

This static class implements the forward and adjoint SSRFT, as described in [TYUC2019, 3.2]:

\[\text{SSRFT} = R\,\mathcal{F}\,\Pi\,\mathcal{F}\,\Pi'\]

Where \(R\) is a random index-picker, mathcal{F} is either a DCT or a FFT (if x is complex), and \(\Pi, \Pi'\) are random permutations which also multiply entries by Rademacher or phase noise (if x is complex).

static issrft(x, out_dims, seed=479915, norm='ortho')[source]

Adjoint SSRFT (see class docstring for definition).

Inversion of the SSRFT, such that for a square SSRFT, x == issrft(ssrft(x)) holds.

Note that this means that, for complex x, the adjoint operation involves complex conjugation as well. See class docstring and ssrft() for more details.

Parameters:: out_dims – In this case, instead of random index-picker, which reduces dimension, we have an index embedding, which increases dimension by placing the x entries in the corresponding indices (and leaving the rest to zeros). For this reason, out_dims >= len(x) is required.

static ssrft(x, out_dims, seed=479915, norm='ortho')[source]

Forward SSRFT (see class docstring for definition).

Parameters:

x – Matrix to be projected, such that y = SSRFT @ x
out_dims – Number of rows in y with rows(y) <= rows(x)
seed – Random seed for the SSRFT.
norm – Norm for the FFT and DCT. Currently only ortho is supported to ensure orthogonality.

class skerch.measurements.SsrftNoiseLinOp(shape, seed, batch=None, blocksize=1, norm='ortho', register=True)[source]

Bases: ByBlockLinOp

Linop for the Scrambled Subsampled Randomized Fourier Transform (SSRFT).

This class encapsulates the forward and adjoint SSRFT transforms into a single linear operator with fixed shape and orthonormal columns, which is deterministic for the same dtype, shape and seed (also across different torch devices).

See SSRFT for more details.

Note

This linop can either be square or tall, but never fat (i.e. width must be less or equal than height). Since the SSRFT cannot increase the dimensionality of its input, the forward matmul of this linop is actually the inverse SSRFT, and the adjoint matmul is the forward SSRFT. This slight change in format that doesn’t really affect the semantics of the SSRFT, and it makes it more compatible with other noise linops, which are typically also tall instead of fat. It is also more common to think about orthogonal columns than rows. To make it fat, skerch.linops.TransposedLinOp can still be used.

Note

Unlike classes extending skerch.linops.ByBlockLinOp, in this case it is not efficient to apply this operator by row/column. Instead, this implementation applies the SSRFT directly to the input, by vector, but it also implements get_vector via one-hot vecmul to facilitate parallel measurements and fit the standard interface for skerch measurement linops.

REGISTER = {'default': [(12345, 12395)]}

classmethod check_register()[source]: Checks if two different-seeded linops have overlapping seeds.

get_block(block_idx, input_dtype, input_device)[source]

Samples a SSRFT block.

See base class definition for details.

skerch.recovery module

Functionality to recover linop approximations from their sketches.

skerch.recovery.hmt(sketch_right, lop, as_svd=True)[source]

HMT sketched low-rank recovery.

The core idea behind this method is outlined in [HMT2009]: Given a linear operator \(A\), we assume that there exists a thin, orthogonal matrix \(Q\) such that \(A \approx Q Q^H A\). Crucially, it is possible to efficiently obtain \(Q\) from random measurements, or sketches, followed by QR orthogonalization.

Then, to produce an SVD, it remains to decompose the “thin” matrix \(B^H = P^H A = U \Sigma V^H\), yielding the final SVD: \(A \approx (P U) \Sigma V^H\).

Parameters:

sketch_right – Sketches A @ mop_right (typically a tall matrix).
lop – Our target linear operator A.

Returns:

If as_svd, a triple U, S, Vh with \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, \((Q, B^H)\).

skerch.recovery.hmt_h(sketch_right, lop, as_eigh=True, by_mag=True)[source]

HMT sketched low-rank recovery (Hermitian).

Hermitian version of hmt(). Since \(A = A^H\), we can further compact the core matrix into a Hermitian, “small” matrix: \(A \approx Q Q^H A Q Q^H \Rightarrow C = Q^H A Q\). Then we can obtain the eigendecomposition of \(C\).

Parameters:: by_mag – see skerch.utils.eigh().
Returns:: If as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).

skerch.recovery.nystrom(sketch_right, sketch_left, mop_right, rcond=1e-06, as_svd=True)[source]

Generalized Nyström recovery of linop from left and right sketches.

Single-pass recovery from [Naka2020]. Here, we assume the low-rank approximation \(A \approx A \Omega C \Psi^H A\), with core matrix in the form \(C = (\Psi^H A \Omega)^\dagger\).

Given sketches \(Y = A \Omega\), \(W^H = \Psi^H A\), this method obtains the approximation without requiring any further measurements and by solving a single, compact least-squares system in the form \(A \approx Y (W^H \Omega)^\dagger W^H\).

To obtain the SVD approximation, Y, W are further orthogonalized.

Parameters:

sketch_right – Sketches A @ mop_right (typically a tall matrix).
sketch_left – Sketches (mop_left @ A) (typically a fat matrix).
mop_right – Linop used to obtain sketch_right.
rcond – Singular value threshold using during the least square solving process. See skerch.utils.lstsq().

Returns:

If as_svd, the triple U, S, Vh with \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pair Y, Bh with \(A \approx Y B^H\).

skerch.recovery.nystrom_h(sketch_right, mop_right, rcond=1e-06, as_eigh=True, by_mag=True)[source]

Generalized Nyström recovery of Hermitian linop from right sketch.

Hermitian version of nystrom(). Since \(A = A^H\), we only need sketches from one side: \(A \approx A \Omega C \Omega^H A\), with Hermitian core matrix in the form \(C = (\Omega^H A \Omega)^\dagger\).

Parameters:: by_mag – see skerch.utils.eigh().
Returns:: If as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).

skerch.recovery.oversampled(sketch_right, sketch_left, sketch_inner, lilop, rilop, rcond=1e-06, as_svd=True)[source]

Oversampled recovery of linop from left, right and inner sketches.

Single-pass, oversampled recovery from [BWZ2016]:

Assuming \(A \approx P (P^H A @ Q) @ Q^H = P C Q^H\), this method aims to obtain a core matrix \(C\) via an independent, oversampled sketch. The above approximation is satisfied by the following core matrix: \(C = (\Psi^H P)^\dagger \Psi^H A \Omega (Q^H \Omega)^\dagger\), and \(P, Q\) can be been obtained via left and right outer sketches and subsequent orthogonalization.

The key observation here is that \(C\) can be oversampled, i.e. the number of measurements taken by \(\Psi, \Omega\) can be larger than the number of columns in \(P, Q\). This helps conditioning the pseudoinverses and has been shown in [TYUC2018, 4.1] to yield better recoveries when the singular values of \(A\) decay slowly and the number of outer measurements doesn’t sufficiently cover for that.

The trade-off here is that this method requires the extra (independent) inner measurements, plus the extra least-squares steps involved in solving the two pseudoinverses. To return output as SVD, the core matrix is further diagonalized.

Parameters:

sketch_right – Sketches A @ mop_right (typically a tall matrix).
sketch_left – Sketches (mop_left @ A) (typically a fat matrix).
sketch_inner – Sketches (lilop^H @ A @ rilop) (typically a small, square matrix).
lilop – Left inner linop used to obtain sketch_inner.
rilop – Right inner linop used to obtain sketch_inner.
rcond – Singular value threshold using during the least square solving process. See skerch.utils.lstsq().

Returns:

If as_svd, the triple U, S, Vh with \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pair Y, Bh with \(A \approx Y B^H\).

skerch.recovery.oversampled_h(sketch_right, sketch_inner, lilop, rilop, rcond=1e-06, as_eigh=True, by_mag=True)[source]

Oversampled recovery of linop from right and inner sketches.

Hermitian version of oversampled(). Since \(A = A^H\), we only need outer sketches from one side. But part of the strength in the oversampled method is that the inner measurement matrices are uncorrelated. To allow for this possibility, this function still admits separate parameters for lilop and rilop.

(see [FSMH2025] for more discussion).

Parameters:: by_mag – see skerch.utils.eigh().
Returns:: If as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).

skerch.recovery.singlepass(sketch_right, sketch_left, mop_right, rcond=1e-06, as_svd=True)[source]

Single-pass recovery of linop from left and right sketches.

Single-pass recovery from [TYUC2018, 4.1]: Assuming \(A \approx A Q Q^H\), and given Y, W.H, Omega, where \(Y = A \Omega\) and \(W^H = \Psi^H A\) are our random sketches, we can recover A without any further measurements, thus being a single-pass method.

First, we obtain Q via QR decomposition of W. Then, it holds

\[\begin{split}\begin{align} Y (Q^H \Omega)^{-1} Q^H &= (A \Omega) (Q^H \Omega)^{-1} Q^H\\ &\approx (A Q Q^H \Omega) (Q^H \Omega)^{-1} Q^H \\ &= (A Q) Q^H \approx A \\ \end{align}\end{split}\]

Thus, we just need to solve a well-conditioned least-squares problem to approximate A. To obtain the full SVD, we further need to compute the SVD of Y @ pinv(Q.H @ Omega) and recombine.

Parameters:

sketch_right – Sketches A @ mop_right (typically a tall matrix).
sketch_left – Sketches (mop_left @ A) (typically a fat matrix).
mop_right – Linop used to obtain sketch_right.
rcond – Singular value threshold using during the least square solving process. See skerch.utils.lstsq().

Returns:

If as_svd, the triple U, S, Vh with \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pair Y, Bh with \(A \approx Y B^H\).

skerch.recovery.singlepass_h(sketch_right, mop_right, rcond=1e-06, as_eigh=True, by_mag=True)[source]

Single-pass recovery of Hermitian linop from right sketch.

Hermitian version of singlepass(). Since \(A = A^H\), we only need sketches from one side, because in the Hermitian case Q can also be obtained by orthogonalizing sketch_right, and the method remains unchanged.

Parameters:: by_mag – see skerch.utils.eigh().
Returns:: If as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).

skerch.synthmat module

Utilities for synthetic (random) matrices.

This module allows us to sample random (synthetic) matrices, following a variety of structures, such as approximately low-rank plus diagonal.

class skerch.synthmat.RandomLordMatrix[source]

Bases: object

Static class to sample random lowrank + diagonal matrices.

Inspired by the following publications:

This class allows to sample random matrices with the following structures:

Low rank + noise: dampened Gaussian noise matrix in the form G @ G.H, plus a (low-rank) subset of the identity matrix.
Exponential: Singular values have a low-rank unit section followed by exponential decay
Polynomial: Singular values have a low-rank unit section followed by polynomial decay

Optionally, a random diagonal can be added to all the above. Matrices can also be Hermitian and non-PSD.

classmethod exp(shape=(100, 100), rank=10, decay=0.5, diag_ratio=0.0, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]

Random matrix with exponential singular value decay plus diagonal.

Like poly(), but the singular value decay of the low-rank component follows \(10^{-d}, 10^{-2d}, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.01 for slow decay, 0.1 for medium, 0.5 for fast.

static get_decay_svals(dims, rank, decay_type, decay, dtype, device)[source]

Singular values with a particular decay pattern.

Returns:: A vector of size dims and given dtype, device. The vector contains rank unit entries, which then decay towards zero following the given decay_type and decay intensity.
Decay_type:: Can be exp (exponentially fast decay), and poly (polynomially fast). Check exp() and poly() for details.

static mix_matrix_and_diag(mat, diag, diag_ratio=1.0, inplace=True)[source]

Adding a (normalized) diagonal to a matrix.

Parameters:

mat – Nonzero matrix of shape (h, w).
diag – Vector of shape min(h, w).
diag_ratio – Nonnegative scalar indicating the relative strength of the additive diagonal component. A ratio of 0 means no diagonal is added. A ratio of 1 means the diagonal is normalized to have the average row norm (resp. column, whichever dimension is smaller).
inplace – If true, diag will be scaled to the given ratio in-place, and mat will be added the scaled diagonal also in-place. Otherwise, copies will be returned.

Returns:

The triple (mat + ratio * diag, ratio * diag, ratio).

classmethod noise(shape=(100, 100), rank=10, snr=0.0001, diag_ratio=0.0, seed=479915, dtype=torch.float64, device='cpu')[source]

Low rank and symmetric noise plus a real-valued noisy diagonal.

If diag_ratio is 0, produces a lowrank+noise matrix, as follows: First, A Gaussian IID noise matrix G is sampled for the provided (square) shape. Then, the noisy matrix (snr / dims) * G@G.H is computed. Finally, the first rank diagonal entries are incremented by 1. As a result, we have a symmetric (resp. Hermitian) noisy matrix with strong diagonal for the first rank entries, and with a signal-to-noise ratio of snr.

If diag_ratio != 0, a real-valued Gaussian IID diagonal is added via mix_matrix_and_diag().

Parameters:

rank (int) – How many diagonal entries should be incremented by 1.
snr (float) –
Signal-to-noise ratio for the symmetric noise. In [TYUC2019, 7.3.1] the following values are used: 1e-4 for low noise, 1e-2 for mid noise, and 1e-1 for high noise.

Returns:

The pair (mat + diag, diag) where mat is (approximately) low-rank following the above recipe.

classmethod poly(shape=(100, 100), rank=10, decay=2.0, diag_ratio=0.0, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]

Random matrix with polynomial singular value decay plus diagonal.

Produces a matrix in the form U @ S @ V.H + D, where U and V are random orthogonal matrices, S has entries with polynomially decaying magnitude, analogous to the ones described in [TYUC2019, 7.3.1], and D has random Gaussian IID entries with intensity given by diag_ratio: If diag_ratio != 0, a real-valued Gaussian IID diagonal is added via mix_matrix_and_diag().

Parameters:

rank (int) – Entries in S[:rank] will have a magnitude of 1.
decay (float) –
Parameter determining how quickly magnitudes in S[rank:] decay to zero, following \(2^d, 3^d, 4^d, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.5 for slow decay, 1 for medium, 2 for fast.
diag_ratio – Intensity of D. See mix_matrix_and_diag() for details.
symmetric (bool) – If true, U == V.
psd – If false, and matrix is symmetric, the singular values will be randomly sign-flipped to create a non-PSD matrix.

Returns:

The pair (mat + diag, diag) where mat is (approximately) low-rank following the above recipe.

skerch.utils module

General utilities for the skerch library.

exception skerch.utils.BadSeedError[source]

Bases: Exception

Error to be thrown when a random seed is not as it should.

exception skerch.utils.BadShapeError[source]

Bases: Exception

Error to be thrown when a shape is not as it should.

skerch.utils.complex_dtype_to_real(dtype)[source]

Given a complex datatype, returns its real counterpart.

Parameters:: dtype – Complex torch dtype, e.g. torch.complex128.
Returns:: Float torch dtype, e.g. torch.float64.

skerch.utils.eigh(A, by_descending_magnitude=True)[source]

Hermitian Eigendecomposition.

Parameters:: by_descending_magnitude – If true, eigenpairs are given by descending magnitude of eigenvalues (e.g. -4, 3, 0.1, -0.001, 0). If false, eigenpairs are given by descending value (3, 0.1, 0, -0.001, -4).
Returns:: The eigendecomposition (Lambda, Q) such that A = Q @ diag(Lambda) @ Q.H.

skerch.utils.gaussian_noise(shape, mean=0.0, std=1.0, seed=None, dtype=torch.float64, device='cpu')[source]

Reproducible torch.normal Gaussian noise.

Returns:: A tensor of given shape, dtype and device, containing gaussian noise with given mean and std (analogous to torch.normal), but with reproducible behaviour fixed to given random seed and device.

Warning

PyTorch does not ensure RNG reproducibility across devices. The device parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.

skerch.utils.htr(x, in_place=False)[source]

Hermitian transposition wrapper.

This convenience wrapper exists for several reasons:

While torch supports .H, numpy does not.
In multiprocessing settings, .conj() seems to sometimes not work, which is likely related to in_place/view/copy behaviour.
Transposition of vectors via .T throws a warning since it is a no-op.

This function avoids all three issues, by returning the input as a conjugate, and also transposed if it is a matrix.

Parameters:

x – Numpy or Torch object, expected to be a vector or matrix (undefined behaviour otherwise).
in_place – If true, the imaginary entries are flipped in-place. Otherwise, a new copy of the input is always returned. No in-between View behaviour is possible (thus this function may be suboptimal in some circumstances, but avoids multiprocessing issues).

Returns:

The Hermitian transpose of x (if matrix), or the compex conjugate if vector. Undefined otherwise.

skerch.utils.lstsq(A, b, rcond=1e-06)[source]

Least-squares solver.

PyTorch/NumPy/HDF5 wrapper.

Parameters:: rcond – Cutoff value, any singular values of A smaller than rcond * largest_singular_value are considered zero.
Returns:: x such that frob(Ax - b) is minimized.

skerch.utils.phase_noise(shape, seed=None, dtype=torch.complex128, device='cpu', conj=False)[source]

Reproducible noise uniformly distributed on the complex unit circle.

Note

This function makes use of uniform_noise() to sample the phase noise. If x itself has been generated using uniform_noise, make sure to use a different seed to mitigate correlations.

Warning

PyTorch does not ensure RNG reproducibility across devices. The device parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.

Parameters:: conj – If true, the generated noise is the complex conjugate of the noise if all other parameters were equal.
Returns:: A tensor of given shape, dtype and device, containing complex i.i.d. noisy entries uniformly distributed around the unit circle.

skerch.utils.phase_shift(x, seed=None, inplace=True, rng_device='cpu', conj=False)[source]

Reproducible phase shift using phase noise.

Like rademacher_flip(), but applying the complex counterpart of Rademacher noise, i.e. multiplies with phase noise on the complex unit circle.

skerch.utils.pinv(A)[source]

Pseudo-inversion of a given matrix.

PyTorch/NumPy/HDF5 wrapper.

Parameters:: A – matrix to pseudo-invert, of shape (h, w). It needs to be compatible with either scipy.linalg.pinv or torch.linalg.qr.
Returns:: Pseudoinverse of A with shape (w, h).

skerch.utils.qr(A, in_place_q=False, return_R=False)[source]

Thin QR-decomposition of given matrix.

PyTorch/NumPy/HDF5 wrapper.

Parameters:

A – Matrix to orthogonalize, needs to be compatible with either scipy.linalg.qr or torch.linalg.qr. It must be square or tall.
in_place_q – If true, A[:] = Q will be performed.

Returns:

If return_R is true, returns (Q, R) such that Q has orthonormal columns, R is upper triangular and A = Q @ R as per the QR decomposition. Otherwise, returns just Q.

skerch.utils.rademacher_flip(x, seed=None, inplace=True, rng_device='cpu')[source]

Reproducible random sign flip using Rademacher noise.

Variation of rademacher_noise(), where x can be any tensor of any type, and it gets multiplied with Rademacher noise.

Parameters:: inplace – Whether the input tensor gets multiplied in-place.
Returns:: The pair (x * rademacher_mask, rademacher_mask).

See also

rademacher_noise() for notes on reproducibility and more info.

skerch.utils.rademacher_noise(shape, seed=None, device='cpu')[source]

Reproducible Rademacher noise.

Note

This function makes use of uniform_noise() to sample the Rademacher noise. If x itself has been generated using uniform_noise, make sure to use a different seed to mitigate correlations.

Warning

PyTorch does not ensure RNG reproducibility across devices. The device parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.

Parameters:

shape – Shape of the output tensor with Rademacher noise.
seed – Seed for the randomness.
device – Device of the output tensor and also source for the noise.

skerch.utils.randperm(n, seed=None, device='cpu', inverse=False)[source]

Reproducible randperm of n integers from 0 to (n-1), both included.

Parameters:: inverse (bool) – If False, a random permutation P is provided. If true, an inverse permutation Q is provided, such that both permutations are inverse to each other, i.e. v == v[P][Q] == v[Q][P].

Warning

PyTorch does not ensure RNG reproducibility across devices. The device parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.

skerch.utils.serrated_hadamard_pattern(v, blocksize, with_main_diagonal=True, lower=True, use_fft=False)[source]

Shifted copies of vectors for block-triangular Girard-Hutchinson.

Parameters:

v – Torch tensor expected to contain zero-mean, uncorrelated entries. If vector, shift is applied directly. If tensor, shift is applied to last dimension.
with_main_diagonal – If true, the main diagonal will be included in the patterns, otherwise excluded.
lower – If true, the block-triangles will be below the diagonal, otherwise above.
use_fft – See subdiag_hadamard_pattern().

Returns:

A vector of same shape, type and device as v, composed of shifted copies of v following a block-triangular (serrated) pattern.

The shifted expectation mechanism introduced in subdiag_hadamard_pattern() can be modified to yield expectations of triangular segments around any diagonal. This function performs the shifts corresponding to triangles of a given block size.

For example, given a 10-dimensional vector, the corresponding serrated pattern with blocksize=3, with_main_diagonal=True, lower=True yields the following entries:

v1
v1 + v2
v1 + v2 + v3
v4
v4 + v5
v4 + v5 + v6
v7
v7 + v8
v7 + v8 + v9
v10

If main diagonal is excluded, it will look like this instead:

0
v1
v1 + v2
0
v4
v4 + v5
0
v7
v7 + v8
0

And if lower=False, it will look like this instead:

v1 + v2 + v3
v2 + v3
v3
v4 + v5 + v6
v5 + v6
v6
v7 + v8 + v9
v8 + v9
v9
v10

skerch.utils.subdiag_hadamard_pattern(v, diag_idxs, use_fft=False)[source]

Shifted copies of vectors for subdiagonal Hutchinson estimation.

Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^H] = I\), consider this generalized formulation of the Girard-Hutchinson diagonal estimator:

\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(\bar{v}) \odot Av \big]\]

If the \(\varphi\) function is the identity, then \(f(A)\) equals the main diagonal of \(A\). If e.g. \(\varphi\) shifts the entries in \(v\) downwards by k positions, we get the k-th subdiagonal.

This function performs the corresponding (non-circular) shift.

See also

[BN2022]: Robert A. Baston and Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.

Parameters:

v – Torch tensor expected to contain zero-mean, uncorrelated entries. If vector, shift is applied directly. If tensor, shift is applied to last dimension.
diag_idxs – Iterator with integers corresponding to the subdiagonal indices to include, e.g. 0 corresponds to the main diagonal, 1 to the diagonal above, -1 to the diagonal below, and so on.
use_fft – If false, shifted copies of v are pasted on the result. This requires only len(v) memory, but has len(v) * len(diag_idxs) time complexity. If this argument is true, an FFT convolution is used instead. This requires at least 4 * len(v) memory, but the arithmetic has a complexity of len(v) * log(len(v)), which can be advantageous whenever len(diag_idxs) becomes very large.

Returns:

A vector of same shape, type and device as v, composed of shifted copies of v as given by diag_idxs.

skerch.utils.svd(A)[source]

Singular Value Decomposition.

PyTorch/NumPy/HDF5 wrapper.

Returns:: The SVD (U, S, Vh) such that A = U @ diag(S) @ Vh.

skerch.utils.torch_dtype_as_str(dtype)[source]

Torch dtype to string.

Given a PyTorch datatype object, like torch.float32, returns the corresponding string, in this case ‘float32’.

skerch.utils.truncate_decomp(k, U=None, S=None, Vh=None, copy=False)[source]

Truncation of diagonal decomposition.

Parameters:: copy – If true, returns a hard copy of the truncation, otherwise returns a view of the given objects.
Returns:: For any given U, S, Vh, returns (respectively) U[:, :k], S[:k], Vh[:k, :], i.e. the k parameter represent the number of dimensions that we wish to keep.

If U, S, Vh is the SVD or eigendecomposition of a matrix, and the S values are in non-ascending magnitude, this truncation returns the top-k components.

skerch.utils.uniform_noise(shape, seed=None, dtype=torch.float64, device='cpu')[source]

Reproducible torch.rand uniform noise.

Returns:: A tensor of given shape, dtype and device, containing uniform random noise between 0 and 1 (analogous to torch.rand), but with reproducible behaviour fixed to given random seed and device.

Warning

PyTorch does not ensure RNG reproducibility across devices. The device parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.