skerch package
skerch: Sketched linear operations for PyTorch.
skerch Subpackages
skerch Submodules
skerch.a_posteriori module
A-posteriori utilities for sketched decompositions.
This module implements scalable utilities to assess the quality of a sketched approximation, once it has been obtained.
- skerch.a_posteriori.apost_error(lop1, lop2, device, dtype, num_meas=5, seed=479915, noise_type='gaussian', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, adj_meas=None)[source]
A-posteriori sketched error estimation.
This function implements the error estimation procedure discussed in [TYUC2019, 6]. Given two linear operators \(A, \hat{A}\), it performs
num_measurementsrandom vector multiplications \(y_i = Av_i, \hat{y}_i = \hat{A}v_i\), and the \(y\) outputs can then be used to estimate the norms \(\lVert A \rVert_F^2, \lVert \hat{A} \rVert_F^2\), as well as the error \(\lVert A - \hat{A} \rVert_F\).Note
The test measurements performed here are assumed to be random and independent of how either
lopwas obtained. For this reason, it is important to pick a combination ofnoise_typeandseedthat does not overlap with any other noise generation procedure used before.- Parameters:
lop1 – Linear operator with a
shape=(h, w)attribute and implementing a left (x @ mat) or right(mat @ x)matmul op. It is assumed to match the givendeviceanddtype.lop2 – Linear operator to compare
lop1against. It must match in shape, device and dtype.num_meas – Number of test measurements that will be performed on
lop1andlop2.adjoint – If true, left-matmul is used, otherwise right-matmul.
- Returns:
A tuple
((f1, f2, e), (F1, F2, E)), wheref1, f2are the estimated Frobenius norms squared oflop1, lop2andeis the estimated Frobenius error squared. The uppercase counterparts are lists containing allnum_measindividual measurements (which are averaged to obtain the estimates). To get confidence bounds on this estimate, useapost_error_bounds().
- skerch.a_posteriori.apost_error_bounds(num_measurements, rel_err, is_complex=False)[source]
Probabilistic bounds for a-posteriori sketch error estimation.
The Frobenius error between any two linear operators can be estimated from sketches using
apost_error(). But this estimation is random and subject to error.This function retrieves the probabilistic error bounds presented in [TYUC2019, 6.4], which tell us the probability that the estimated error is wrong by a given amount. Interestingly, these bounds only depend on the number of measurements and data type, and not on the operator dimension.
Note
This function does not perform the error estimation itself, it merely informs about the probabilistic bounds associated with a given error. For the error estimation, see
apost_error().- Parameters:
num_measurements (int) – How many measurements (assumed Gaussian iid) will be performed for the a-posteriori error estimation. More measurements means tighter error bounds (i.e. the output of our error estimation is more reliable).
rel_err (float) – A float between 0 and 1, indicating the relative error that we want to consider. If
xis the actual error we want to estimate, andyis our estimation, this function will return the probability thatyis in the range[(1-rel_err)x, (1+rel_err)x].is_complex – The returned probabilities will be tighter if the error is measured on complex linear operators, using complex Gaussian noise. This boolean flag allows to specify this (assumed real-valued if false).
- Returns:
Probabilities that, given the indicated number of measurements, the a-posteriori method yields values outside of
actual_error * (1 +/- rel_err). Ideally, we want a sufficient number of measurements, such that the returned probabilities for smallrel_errare themselves small (this means that the corresponding error estimation is tight).
- skerch.a_posteriori.scree_bounds(S, err_frob)[source]
A-posteriori scree bounds for low-rank approximations.
Reference: [TYUC2019, 6.5].
When we do a sketched eigen- or singular-decomposition of a linear operator, we often don’t know what is the effective rank of said operator, and whether we took enough sketched measurements to cover it. A scree analysis is a tool to estimate the smallest rank that captures a sufficient amount of the actual matrix, typically by looking at inflection points in the curve associated with sharp decays in spectral norm.
A common issue with sketched methods is that we don’t have access to the actual spectrum, and thus we can’t perform a scree analysis. Instead, it is possible to use estimates from
apost_error()to retrieve upper and lower scree bounds, which ideally would still inform us about the effective rank of the original operator.- Parameters:
S – Vector of the estimated eigen/singular values, expected to contain entries in non-ascending magnitude.
err_frob – A-posteriori estimate of
frob_norm(M - M'), whereM'is the recovery ofMwith singular valuesS, as returned byapost_error()(note that here the norm is not squared).
Usage example:
U, S, Vh = ssvd(A) Ahat = CompositeLinOp((("US", U * S), ("Vh", Vh))) (f1, _, err), _ = apost_error(A, Ahat, "...") scree_lo, scree_hi = scree_bounds(S, err**0.5)
skerch.algorithms module
In-core sketched algorithms.
- class skerch.algorithms.SketchedAlgorithmDispatcher[source]
Bases:
objectProvides sketched algos with measurement linops and recovery algorithms.
Sketched methods are generally flexible in the kind of noise or recovery method they use. In order to accommodate for this flexibility,
skerchalgorithms call this dispatcher with their requested configuration (e.g."gaussian"for noise and"nystrom"for recovery).This modular and extendible framework makes it easy to change between existing options, and also to add new options without having to modify the sketched algorithms. For example, if users want to run
ssvd()with a new type of noise linop calledMyNoise, they just need to extendmop()into a new dispatcher class, and provide the dispatcher to SSVD.Detailed examples on how to do this can be found in the documentation.
- static mop(mop_type, hw, seed, dtype, blocksize=1, register=False)[source]
Returns measurement linop with given specs.
The returned linop must follow the same interface as
skerch.linops.ByBlockLinOp, i.e. must support:Left and right matmul to matrices via
@A
get_blocks(dtype, device)method that returns pairs in the form(block, idxs), whereblockis a subset of vectors, assumed to be columns of the returned linop, and corresponding to the givenidxs.
- Parameters:
mop_type – A string defining the measurement linop type (e.g.
"rademacher","gaussian","ssrft","phase").hw – Tuple
(height, width)with the desired linop shape.seed – Random seed for the measurement linop.
blocksize – How many measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.register – Whether to register this linop in a global tracker that will raise an exception if other linops with overlapping seeds have been already created. This is useful to debug new methods if they have e.g. issues due to the use of correlated noise. See
skerch.measurements.RademacherNoiseLinOpfor an example.
- static recovery(recovery_type, hermitian=False)[source]
Returns recovery funtion with given specs.
- Parameters:
recovery_type – String specifying which recovery type to be used (e.g.
"singlepass","nystrom","oversampled_500").- Returns:
A tuple
(fn, inner_dims), wherefnis a recovery function. Ifinner_dimsis None,fnshould have an interface likeskerch.recovery.singlepass(). Ifinner_dimsis a positive integer,fnshould behave likeskerch.recovery.oversampled().
- class skerch.algorithms.TriangularLinOp(lop, stair_width=None, num_gh_meas=0, lower=True, with_main_diagonal=False, use_fft=False, seed=479915, noise_type='rademacher', max_mp_workers=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, meas_blocksize=None)[source]
Bases:
BaseLinOpGiven a square linop, compute products with its lower/upper triangle.
The triangle of a linear operator can be approximated from the full operator via a “staircase pattern” of exact measurements. For example, given an operator of shape
(1000, 1000), and stairs of size 100, we could obtain 9 exact measurements strictly under the diagonal, the first one coveringlop[100:, :100], the next onelop[200:, 100:200], and so on. And we would leave out only 9 small triangles near the diagonal in a sort of “serrated” pattern. The smaller the stair size, the more measurements we do and the more closely the full triangle is approximated (a stair size of 1 results in an exact method, doing full measurements).The interesting thing is that the “serrated” part can be estimated with a variation of the Girard-Hutchinson estimator:
\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(\bar{v}) \odot Av \big]\]Where \(\varphi\) follows a
skerch.utils.serrated_hadamard_pattern(). By linearity, it can be shown that this expectation samples entries of \(A\) from the block-triangles near the diagonal.Finally, we add the exact step-wise measurements and the serrated Girard-Hutchinson estimation (see e.g.
hutchpp()), resulting in a triangular estimation.Usage example:
lop = MyLinOp("...") tril = TriangularLinOp(lop, lower=True, stair_width=10, num_gh_meas=1000) w = tril @ v
- Parameters:
lop – A square linear operator. It must implement a
lop.shape = (dims, dims)attribute as well as the left- and right- matmul operator@, interacting with torch tensors.stair_width – Width of each step in the staircase pattern. If it is 1, a total of
dimsexact measurements will be performed and the triangular products will be exact. If it equalsdims, no exact measurements will be performed (since the staircase pattern would cover the full triangle). The step size regulates this trade-off: Ideally, we want as many exact measurements as possible, but not too many. If no value is provided,dims // 2is chosen by default, such that only 1 exact measurement is performed.num_hutch_measurements – Number of measurements for the serrated estimation that complements the staircase estimation. This estimator generally requires many measurements to be informative, and it can even be counter- productive if not enough measurements are given. If
lopis not diagonally dominant, consider setting this to 0 for a sufficiently good approximation via deterministicstaircase_measurements. Otherwise, make sure to provide a sufficiently high number of measurements.lower – If true, lower triangular matmuls will be computed. Otherwise, upper triangular.
with_main_diagonal – If true, the main diagonal will be included in the triangle, otherwise excluded. If you already have precomuted the diagonal elsewhere, consider excluding it from this approximation, and adding it separately.
use_fft – Whether to use FFT for the Hutchinson estimation. See
skerch.utils.subdiag_hadamard_pattern()for more details.seed – Seed for the random measurements used in the serrated estimator.
noise_type – String indicating noise type to be used in the serrated estimator. Must be supported by the given
dispatcher.meas_blocksize – How many serrated measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.
- LOP_REPR_CHARS = 30
- skerch.algorithms.hutch(lop, lop_device, lop_dtype, num_meas, seed=479915, noise_type='rademacher', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, return_diag=True, defl_Q=None)[source]
Girard-Hutchinson trace and diagonal estimator.
Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^H] = I\), consider the Girard-Hutchinson diagonal estimator:
\[\mathbb{E}_{v \sim \mathcal{R}} \big[ \bar{v} \odot Av \big]_i = \sum_j A_{ij} \mathbb{E}\big[ \bar{v}_i v_j \big] = A_{ii}\]The convergence of this estimator can be accelerated greatly if we also incorporate rank deflation \(A = (Q Q^H) A + (I - (Q Q^H)) A\), where \(Q\) is an orthogonal matrix spanning the top space of \(A\). Similarly to
ssvd(), it turns out that we can obtain \(Q\) efficiently fromdefl_dimsrandom measurements. Then, the diagonal of \((Q Q^H) A\) can be obtained exactly, and \(diag((I - (Q Q^H)) A)\) is then estimated via Girard-Hutchinson usingextra_gh_meas(uncorrelated) measurements. This is theHutch++algorithm (see [BN2022]).A very similar logic applies for the estimation of the trace, which is the sum of the diagonal entries. In fact, most computations can be recycled, and this algorithm computes and returns both quantities jointly with minimal overhead. The Hutchinson estimator for the trace is:
\[tr(A) = \langle A, I \rangle = \langle A, \mathbb{E}_{v \sim \mathcal{R}}[v v^H] \rangle = \mathbb{E}_{v \sim \mathcal{R}}[v^H A v]\]Note
As it can be seen in Table 1 from [BN2022], Gaussian noise is generally less efficient than Rademacher for the Girard-Hutchinson step. In general, we observe that noise entries that are more or less of the same magnitud help with stabiltiy. We also see that the number of required G-H samples for good recovery is in the millions, i.e. this is not efficient for smaller matrices.
- Parameters:
lop – The \(A\) operator whose diagonal we wish to estimate.
lop_device – The device where \(A\) runs.
lop_dtype – The datatype \(A\) interacts with.
num_meas – How many measurements will be used.
seed – Overall random seed for the algorithm.
noise_type – Which noise to use. Must be supported by the given
dispatcher(seeSketchedAlgorithmDispatcher.mop()).meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.return_diag – If true, diagonal estimation is also returned. Otherwise just the trace.
defl_Q – The tall \(Q\) matrix used for optional deflation (see explanation). It should be composed of orthonormal columns and have shape
(dims, defl_dims). Ideally it is aligned with the top-space oflop. Also, it should be obtained from measurements that are uncorrelated to this estimator, so make sure to use a different source of random noise (e.g. a different seed) here.
- Returns:
A dictionary in the form
{"tr": tr, "diag": diag}with the trace and diagonal (ifreturn_diagis true) estimations.
- skerch.algorithms.seigh(lop, lop_device, lop_dtype, outer_dims, seed=479915, noise_type='rademacher', recovery_type='hmt', lstsq_rcond=1e-06, meas_blocksize=None, by_mag=True, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>)[source]
In-core Sketched Hermitian eigendecomposition (EIGH).
This function is the Hermitian version of
ssvd(). It behaves largely the same, with the following differences:It assumes that the provided
lopis HermitianLess measurements are performed due to Hermitian symmetry
Unlike the SVD, the returned value is a pair \((\Lambda, Q)\), where \(A \approx Q diag(\Lambda) Q^H\). Since \(A\) is Hermitian, this is an eigendecomposition: \(\Lambda\) may contain negative values, and both left and right matrices are identical.
Refer to the
ssvd()docs for more details.
- skerch.algorithms.snorm(lop, lop_device, lop_dtype, num_meas=5, seed=479915, noise_type='gaussian', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, adj_meas=None, norm_types=('fro', 'op'))[source]
Sketched norm estimation.
It is possible to estimate the norm of a linear operator from a few random measurements. In [TYUC2019, 6.2], it is shown that the Frobenius norm is proportional to the norm of the measurements themselves (akin to a Gram trace estimation). For \(q\) random measurements \(\Omega\) and
beta = 2 if complex_dtype else 1we have:\[\lVert A \rVert_F^2 = \frac{1}{\beta q} \mathbb{E} \big[ A \Omega \big]\]From the same measurements, we can also estimate the operator norm (largest singular value) as follows. Consider the linear operator \(A\), and its top-
kapproximation \(\hat{A} = Q Q^H A\) for orthogonal \(Q\). Then:\[\lVert A \rVert_2 = \lVert \hat{A} \rVert_2 \approx \lVert Q Q^H A \rVert_2 = \lVert Q^H A \rVert_2\]And, as discussed in
ssvd(), \(Q\) can be efficiently estimated from a few random measurements \(A \Omega\).- Parameters:
lop – The \(A\) operator whose norms we wish to estimate.
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
num_meas – How many measurements will be used to estimate the norms (\(q\) in the above description)
seed – Overall random seed for the algorithm
noise_type – Which noise to use for \(\Omega\). Must be supported by the given
dispatcher(seeSketchedAlgorithmDispatcher.mop())meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.adj_meas – If true, measurements to obtain \(Q\) are adjoint. This can affect accuracy somewhat.
norm_types – Collection with norm types to be returned. Currently supported:
"fro"(Frobenius),"op"(operator norm)
- Returns:
The tuple
(result, (Q, R)), where(Q, R)is the QR decomposition of the sketched measurements, andresultis a dictionary in the form{norm_type: value}.
- skerch.algorithms.ssvd(lop, lop_device, lop_dtype, outer_dims, seed=479915, noise_type='rademacher', recovery_type='hmt', lstsq_rcond=1e-06, meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>)[source]
In-core Sketched SVD.
The core idea behind sketched SVDs, introduced in [HMT2009] and outlined in
skerch.recovery.hmt(), relies on the existence of orthogonal thin matrices \(P, Q\), such that the following approximation holds:\[A \approx P P^H A Q Q^H = P C Q^H\]The crucial point here is that it is possible to efficiently obtain \(P, Q\) from random measurements, or sketches. Then, it only remains to decompose the small “core” matrix \(C\), which can be done with traditional methods.
Depending on which
recovery_typewe use, there are several strategies to solve this sketch, each with their nuances and tradeoffs (see e.g. note below). Available recovery methods are implemented and documented inskerch.recovery.Note
The straightforward recovery method from [HMT2009] (implemented in
skerch.recovery.hmt()) requires us to first obtain \(Q\), and then do a second round of measurements \(A Q\) to obtain the core matrix. Further developments have shown that it is possible to estimate \(A Q\) without having to run a second pass of measurements by solving a least-squares problem, which can lead to significant speedup if we leverage parallelization (see e.g. [TYUC2018]). These single-pass recovery methods have also been implemented inskerch.recovery. Check the corresponding docs and therecovery_typeparameter for more info.- Parameters:
lop – The linear operator \(A\) to be decomposed
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
outer_dims – How many measurements will be used to obtain \(P\) and \(Q\) (respectively)
seed – Overall random seed for the algorithm
noise_type – Which noise to use. Must be supported by the given
dispatcher(seeSketchedAlgorithmDispatcher.mop())recovery_type – Which recovery method to use. Must be supported by the given
dispatcher(seeSketchedAlgorithmDispatcher.recovery()andskerch.recovery).lstsq_rcond – Least-squares condition threshold used in the recovery, see
SketchedAlgorithmDispatcher.recovery()andskerch.recovery.meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.
- Returns:
The singular value decomposition
U, S, Vhwhere \(A \approx U diag(S) V^H\).
- skerch.algorithms.xhutchpp(lop, lop_device, lop_dtype, x_dims=0, gh_meas=0, seed=479915, noise_type='rademacher', meas_blocksize=None, dispatcher=<class 'skerch.algorithms.SketchedAlgorithmDispatcher'>, return_diag=True, cache_xmop=True)[source]
Diagonal and trace sketched approximation via Hutch/XDiag.
In
hutch()we see how to estimate the trace and diagonal via Girard-Hutchinson/Hutch++. This function extends this functionality withXTrace/XDiag[ETW2024], which allow us to performx_dimsdimensional deflation, and then recycle thex_dimsmeasurements for the subsequent deflated Girard-Hutchinson estimator, which we can also further enrich withgh_measextra measurements. The memory cost of this function is dominated by the deflation: we need to store a matrix of(dims, x_dims)size. The runtime cost is typically dominated by the total number of measurements:2 * x_dims + gh_meas.Note
This function is equivalent to plain Girard-Hutchinson for
x_dims=0, and equivalent toXDiagifgh_meas=0. For theHutch++estimator, runhutch()providing the desiredQ_defldeflation matrix.See also
This blogpost provides derivations and elaborates on the relationship between Hutch++ and XDiag, and how this can lead to a common implementation such as this one.
- Parameters:
defl_dims – How many measurements will be used to obtain the \(Q\) deflation matrix
lop – The \(A\) operator whose diagonal we wish to estimate.
lop_device – The device where \(A\) runs
lop_dtype – The datatype \(A\) interacts with
x_dims – How many measurements will be used to obtain the \(Q\) deflation matrix, and the subsequent deflated estimation (the
kabove).seed – Overall random seed for the algorithm
noise_type – Which noise to use. Must be supported by the given
dispatcher(seeSketchedAlgorithmDispatcher.mop())meas_blocksize – How many sketched measurements should be done at once. Ideally, as many as it fits in memory. See
skerch.linops.ByBlockLinOpfor more details.cache_mop – If true, the measurement linear operator (which is used twice) is converted to an explicit matrix and kept around. This saves computation, since it does not need to be sampled twice, at the expense of the memory required to store its entries.
- Returns:
A dictionary in the form
{"tr": t, "diag": d, "Q": Q, "R": R, "Sh_k": S, "Psi": P}containing trace and diagonal (ifreturn_diagis true) estimations, as well as the \(Q, R\) matrices corresponding to the QR decomposition of the deflation measurements, and the \(S_k^H, \Psi\) objects needed to compute the exchangeable estimator.
skerch.hdf5 module
Persistent and out-of-core tensor functionality via HDF5.
- class skerch.hdf5.DistributedHDF5Tensor[source]
Bases:
objectClass to create and manage distributed HDF5 tensors.
In general, multiple processes are not allowed to concurrently open and write to HDF5 files. In order to allow for distributed, but coherent writing, this class allows to create many individual HDF5 files, and then “virtually” merge them into a single coherent dataset. This allows us to distribute a (potentially large) tensor across processes and machines, and write to it concurrently.
Once we are done writing, we may want to access the result. Unfortunately, most OS don’t allow a single process to open many files at once. As a result, any files above the limit would be silently ignored. This class also solves this by providing a
merge()method, to gather the distributed chunks back into a single, monolithic file.See docs for more examples, also on how to create and merge HDF5 datasets directly from command line.
- DATA_NAME = 'data'
- FLAG_DTYPE = dtype('O')
- FLAG_NAME = 'flags'
- INITIAL_FLAG = 'initialized'
- MAIN_PATH = 'ALL'
- classmethod create(basepath_fmt, shape, partition_size, dtype, compression='lzf')[source]
Create a distributed HDF5 measurement dataset.
- Parameters:
basepath_fmt – Format string with the directory and dataset name to store created HDF5 files, in the form
<DIR>/my_dataset_{}.h5. Directory must be empty.shape – Shape of the global tensor corresponding to the whole HDF5 dataset.
partition_size – The HDF5 dataset will be partitioned across its first axis on sub-files. E.g. a shape of
(10, 20)with a partition size of 6 will result in 2 files of shapes(6, 20)and(4, 20).dtype – Datatype of the HDF5 arrays to be created.
- Returns:
The tuple
(all_path, subpaths, begs_ends), whereall_pathis the path of the virtual dataset encompassing the global tensor,subpathsare the paths to the respective chunk HDF5 files, andbegs_endsare the begining (included) and end (excluded) indices that were used to partition the global tensor on subfiles, across its first axis.
- static get_idxs_format(max_idx)[source]
Helper method, retrieves format strings to index HDF5 chunks.
- static iter_partition_idxs(max_idx, partition_size)[source]
Iterate from 0 to
max_idxbypartition_size.- Returns:
Pairs of
(beg, end)wherebegis included and begins with 0, andendis exluded and equals at mostmax_idx.
- classmethod load(path, filemode='r+')[source]
Load the given HDF5 dataset.
Load an individual dataset, such as the virtual ones created via
create()or the monolithic ones merged viamerge().- Parameters:
path – Path to the HDF5 file.
filemode – Default is ‘r+’, read/write, file must preexist. See documentation of
h5py.Filefor more details.
- Returns:
(data, flag, h5f), wheredatais the dataset for the numerical measurements,flagis the dataset for state tracking, andh5fis the (open) HDF5 file handle.
Note
Remember to
h5f.close()once done with this file.
- classmethod merge(all_path, out_path=None, compression='lzf', check_success_flag=None, delete_subfiles_while_merging=False)[source]
Merges distributed HDF5 dataset into a single, monolithic HDF5 file.
- Parameters:
all_path – The
all_pathof a virtual HDF5 dataset like the ones created viacreate(). It must be a “virtual” dataset, i.e. composed of chunks that are distributed across other files.out_path – If None, merged dataset will be written over the given
all_path, i.e. it will be converted from virtual into monolithic in-place. Otherwise, path for a new HDF5 monolithic file where the contents will be written into.check_success_flag – If given, this method will check that all HDF5 flags equal this value, raise an error otherwise.
delete_subfiles_while_merging (bool) – If true, each distributed HDF5 file that is visited will be deleted right after it is merged onto the monolithic HDF5 file. Useful to avoid large memory overhead.
- Returns:
Path of the merged HDF5 file.
- skerch.hdf5.create_hdf5_layout_lop(root, lop_shape, dtype, partition_size, lo_meas=None, ro_meas=None, inner_meas=None, lo_fmt='leftouter_{}.h5', ro_fmt='rightouter_{}.h5', inner_fmt='inner_{}.h5')[source]
Creation of persistent HDF5 files to store linop sketches.
This convenience method prepaers the HDF5 placeholders that can be used to store sketches from a linop of shape
lop_shape. It supports independent creation of left-, right- and inner measurements, thus supporting most use cases involving linear sketches.- Parameters:
root – Where to store the created HDF5 files. Must be an empty directory.
lop_shape – Shape of linear operator to sketch from, in the form
(height, width).dtype – Torch dtype of the operator, e.g.
torch.float32. The HDF5 arrays will be of same type.partition_size – Each created HDF5 will be split into chunks of this many vectors (see
DistributedHDF5Tensor.create()for more details).lo_meas – If given, a dataset of shape
(lo_meas, w)will be created under the name given bylo_fmt.ro_meas – If given, a dataset of shape
(h, ro_meas)will be created under the name given byro_fmt.inner_meas – If given, a dataset of shape
(inner_meas, inner_meas)will be created under the name given byinner_fmt.
skerch.linops module
Basic utilities for (matrix-free) linear operators.
- class skerch.linops.BandedLinOp(indexed_diags, symmetric=False)[source]
Bases:
BaseLinOpBanded matrix-free linear operator.
Given a collection of \(k\) vectors, this class implements a banded linear operator, where each given vector is a (sub)-diagonal. It is composed by \(k\)
DiagonalLinOpoperators, thus its memory and runtime is equivalent to storing and running the individual diagonals.Note
The shape of this linear operator is implicitly given by the diagonal lengths and indices. Any inconsistent input will result in a
BadShapeError. In particular, symmetric banded matrices must also be square. For example, a square tridiagonal matrix of shape(n, n)has a main diagonal at index 0 with lengthn, and two subdiagonals at indices 1, -1 with lengthn - 1. Still, this linop can also be non-square (unless it is symmetric), as long as all given diagonals fit in the implicit shape.Usage example:
diags = {0: some_diag, 1: some_superdiag, -1, some_subdiag} B = BandedLinOp(diags, symmetric=False) w = B @ v
- Parameters:
indexed_diags – Dictionary in the form
{idx: diag, ...}wherediagis a torch vector containing a diagonal, andidxindicates the location of the diagonal: 0 is the main diagonal, 1 the superdiagonal (lop[i, i+1]), -1 the subdiagonal, and so on.symmetric – If true, only diagonals with nonnegative indices are admitted. Each positive index will be replicated as a negative one.
- MAX_PRINT_ENTRIES = 20
- class skerch.linops.BaseLinOp(shape, batch=None)[source]
Bases:
objectBase class for linear operators.
Implements the
.shapeattribute and basic matmul functionality with vectors and matrices (also via the@operator). Intended to be extended with further functionality viamatmul()andrmatmul(). See documentation and codebase for examples.Note
Inputs to
matmul, rmatmulcan be vectors or matrices, but implementations are responsibe to handle matrix-matrix parallelization.- Parameters:
shape –
(height, width)of linear operator.batch – When calling
self @ xagainst a matrix with several vectors, the defaultbatch=Nonewill callmatmul()with the wholexmatrix at once. If a different batch is provided,xwill be split in chunks ofbatchvectors, which will be fed tomatmul()sequentially. This is useful to eg. prevent out-of-memory errors due to processing too large chunks at once.
- t()[source]
Return (Hermitian) transposition of this lop.
Equivalent to
TransposedLinOp(self), seeTransposedLinOp.
- class skerch.linops.ByBlockLinOp(shape, by_row=False, batch=None, blocksize=1)[source]
Bases:
BaseLinOpMatrix-free operator computed by blocks of submatrices.
Consider a large matrix that does not fit in memory. Still, we can perform matrix multiplications by sequentially loading smaller blocks in memory, and then aggregating the result. This is the main motivation for this by-block linear operator:
At instantiation, users determine the
by_rowandblocksizeparameters, which determine how many rows/columns will be internally used at once.At runtime, this class calls sequentially the
get_block()method, performs the partial matrix-multiplications and aggregates them.At development, extensions of this class only have to implement
get_block()accordingly to return blocks of the right shape.
Note
The
by_rowflag has implications in terms of memory and runtime. If true, for alopof shape(h, w)and block size 1, thelop @ xmatrix-vector multiplication will callget_vector()htimes, and performhdot products of dimensionw. If false, it will performwscalar products of dimensionh. In the case ofx @ lop, the number of scalar and dot products are swapped.Therefore, developers need to override
get_block()taking this flag into account, and users should set it to the scenario that is most efficient (e.g. by-column is generally more efficient whenh > w). Seeskerch.measurementsfor examples.- Parameters:
by_row – If true, blocks are groups of rows, otherwise columns.
blocksize – If integer, determines how many rows/columns each block has, and a row will be a matrix.
batch – If the input to matmul is a matrix itself, this determines how many vectors are computed at once. Note that this is different to
blocksize, which refers to the blocks of this linop.
- get_block(block_idx, input_dtype, input_device)[source]
Method to gather a block (matrix) from this linear operator.
Override this method with the desired behaviour. For a shape of
(h, w), it should return matrices of shape(block, w)ifself.by_rowis true, and(h, block)otherwise.Note
If
blocksize==1, returning vectors may work in some cases, but it is recommended to still return matrices (where one of the dimensions equals 1).- Parameters:
block_idx – Index of the block to be returned. Use the auxiliary method
get_vector_idxs()if you need to know which vector indices are associated to this block index. The attributesself.num_vecs, self.num_blocksmay also be helpful.input_dtype – The dtype of the input tensor that this linop was called on. The output of this method should generally be in the same device.
input_device – The device of the input tensor that this linop was called on. The output of this method should generally be in the same device.
- get_blocks(dtype, device='cpu')[source]
Yields all blocks in ascending order with the corresponding idxs.
- get_idx_coords(idx)[source]
Retrieve vector index in block coordinates.
Useful if we want to retrieve a given vector from this linop: since this linop is defined in a by-block fashion, we first need to know which block to retrieve, and then which vector index within the retrieved block.
- Parameters:
idx – Integer between
0andN - 1, whereNis the number of rows, ifby_rowis true, or columns otherwise.- Returns:
Pair of ints
(b, v), such thatvec_idxis thev-th element of theb-th block, e.g. in a by-row linop,lop[idx] = lop.get_block(idxs, dtype, device)[idx].
- class skerch.linops.CompositeLinOp(named_operators)[source]
Bases:
objectMatrix-free composite linear operator.
This class composes an ordered collection of operators
[A, B, C, ...]intoA @ B @ C ...in a matrix-free fashion.Warning
Using this class could be more inefficient than directly computing the composed operator, e.g. if
A.shape = (1, 1000)andB.shape = (1000, 1), then computing the scalarC = A @ Band then applying it is more efficient than keeping aCompositeLinearoperatorwithA, B(in terms of both memory and computation). This class does not check for such cases, users are encouraged to take this into account. Note that composite linops can also be nested.- Parameters:
named_operators – Ordered collection in the form
[(n_1, o_1), ...]where eachn_iis a string with the name of operatoro_i. Eacho_ioperator must implement__matmul__and__rmatmul__as well as theshape = (h, w)attribute. This object will then become the composite operatoro_1 @ o_2 @ ...
- class skerch.linops.DiagonalLinOp(diag)[source]
Bases:
BaseLinOpDiagonal matrix-free linear operator.
Given a vector
vofddimensions, this class implements a diagonal linear operator of shape(d, d)via left- and right-matrix multiplication, as well as theshapeattribute, only requiring linear (\(\mathcal{O}(d)\)) memory and runtime.- Parameters:
diag – Vector to be casted as diagonal linop.
- MAX_PRINT_ENTRIES = 20
- class skerch.linops.SumLinOp(named_signed_operators)[source]
Bases:
BaseLinOpMatrix-free sum of linear operators.
Given a collection of same-shape linear operators
A, B, C, D ..., this class implements the sumA + B + C - D ...in a matrix-free fashion.- Parameters:
named_signed_operators – Collection in the form
{(n_1, s_i, o_1), ...}where eachn_iis a string with the name of operatoro_i, ands_iis a boolean: if true, this operator is to be added, otherwise subtracted. Eacho_ioperator must implement__matmul__and__rmatmul__as well as theshape = (h, w)attribute. This object will then become the operatoro_1 + o_2 - .... All operators must also have same shape.
- class skerch.linops.TorchLinOpWrapper[source]
Bases:
objectLinear operator that always accepts and produces PyTorch tensors.
Since
skerchis built on top of PyTorch, but some other useful LinOps interface with e.g. NumPy arrays instead, this mixin class acts as a wraper on the__matmul__and__rmatmul__operators, so that the operator expects and returns torch tensors, even when the wrapped operator interfaces with NumPy/HDF5.This facilitates interoperability between
skerchand other python linops. Usage example:# extend NumPy linear operator via multiple inheritance class TorchWrappedNumpyLinOp(TorchLinOpWrapper, NumpyLinOp): pass lop = TorchWrappedNumpyLinOp(...) # instantiate normally w = lop @ v # now v can be a PyTorch tensor on any device
- class skerch.linops.TransposedLinOp(lop)[source]
Bases:
objectHermitian transposition of a linear operator.
Given a linear operator \(A\), real or complex, this class wraps its functionality, such that
TransposedLinOp(lop)behaves line the (Hermitian) transposition \(A^H\). This is done by swapping dimensions and matmul methods leveraging the following identity:\(A^H v = ((A^H v)^H)^H = (v^H A)^H\).
Usage example:
lopT = TransposedLinOp(lop)
- Parameters:
lop – Any object supporting a shape
(h, w)attribute as well as the right- and left- matmul operator@.
- skerch.linops.check_linop_input(x, lop_shape, adjoint)[source]
Checks that input is a matrix or vector of the right shape.
- Parameters:
x – The input to this linear operator. It should be either a vector or a matrix of matching shape to
lop_shape.adjoint (bool) – If true,
x @ lopis assumed, otherwiselop @ x.
- Raises:
BadShapeErrorif there is any shape mismatch.
- skerch.linops.linop_to_matrix(lop, dtype=torch.float32, device='cpu', adjoint=False)[source]
Convert a linop to a matrix via one-hot matrix-vector products.
- Parameters:
adjoint – If true, one-hot products are done via
v_i @ lop. Otherwiselop @ v_i.- Returns:
Matrix equivalent to the given
lop, on the givendtypeanddevice.
skerch.measurements module
Functionality to perform sketched measurements.
- class skerch.measurements.GaussianNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True, mean=0.0, std=1.0)[source]
Bases:
RademacherNoiseLinOpRandom linear operator with i.i.d. Gaussian entries.
Like
RademacherNoiseLinOp, but with Gaussian noise.- REGISTER = {'default': [(9876531, 9877531), (12345, 12395)]}
- class skerch.measurements.PhaseNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True, conj=False)[source]
Bases:
RademacherNoiseLinOpRandom linear operator with i.i.d. complex entries in the unit circle.
Like
RademacherNoiseLinOp, but with phase noise. Must be of complex datatype.- Parameters:
conj – For the same seed, the linear operators with true and false
conjvalues are complex conjugates of each other.
- REGISTER = {'default': [(12345, 12395)]}
- class skerch.measurements.RademacherNoiseLinOp(shape, seed, by_row=False, batch=None, blocksize=1, register=True)[source]
Bases:
ByBlockLinOpRandom linear operator with i.i.d. Rademacher entries.
Warning
Since this linop uses random generators and seeds to fetch the blocks, it is important that two different instances do not overlap in seeds, to prevent correlated noise. Use sufficiently far away seeds and
register=Trueto test this behaviour.- Parameters:
shape – Shape of the linop as
(h, w).seed – Random seed used in
get_block()to sample random blocks.by_row – See
skerch.linops.ByBlockLinOp.register – If true, when the linop is created, its seed range (going from
seedtoseed + max(h, w)) is added to a class-wide register, which raises askerch.utils.BadSeedErrorif there are any other instances of this class with overlapping ranges. If false, this behaviour is disabled.
- REGISTER = {'default': [(12345, 12395)]}
- class skerch.measurements.SSRFT[source]
Bases:
objectScrambled Subsampled Randomized Fourier Transform (SSRFT).
This static class implements the forward and adjoint SSRFT, as described in [TYUC2019, 3.2]:
\[\text{SSRFT} = R\,\mathcal{F}\,\Pi\,\mathcal{F}\,\Pi'\]Where \(R\) is a random index-picker, mathcal{F} is either a DCT or a FFT (if
xis complex), and \(\Pi, \Pi'\) are random permutations which also multiply entries by Rademacher or phase noise (ifxis complex).- static issrft(x, out_dims, seed=479915, norm='ortho')[source]
Adjoint SSRFT (see class docstring for definition).
Inversion of the SSRFT, such that for a square SSRFT,
x == issrft(ssrft(x))holds.Note that this means that, for complex
x, the adjoint operation involves complex conjugation as well. See class docstring andssrft()for more details.- Parameters:
out_dims – In this case, instead of random index-picker, which reduces dimension, we have an index embedding, which increases dimension by placing the
xentries in the corresponding indices (and leaving the rest to zeros). For this reason,out_dims >= len(x)is required.
- static ssrft(x, out_dims, seed=479915, norm='ortho')[source]
Forward SSRFT (see class docstring for definition).
- Parameters:
x – Matrix to be projected, such that
y = SSRFT @ xout_dims – Number of rows in
ywithrows(y) <= rows(x)seed – Random seed for the SSRFT.
norm – Norm for the FFT and DCT. Currently only
orthois supported to ensure orthogonality.
- class skerch.measurements.SsrftNoiseLinOp(shape, seed, batch=None, blocksize=1, norm='ortho', register=True)[source]
Bases:
ByBlockLinOpLinop for the Scrambled Subsampled Randomized Fourier Transform (SSRFT).
This class encapsulates the forward and adjoint SSRFT transforms into a single linear operator with fixed shape and orthonormal columns, which is deterministic for the same dtype, shape and seed (also across different torch devices).
See
SSRFTfor more details.Note
This linop can either be square or tall, but never fat (i.e. width must be less or equal than height). Since the SSRFT cannot increase the dimensionality of its input, the forward matmul of this linop is actually the inverse SSRFT, and the adjoint matmul is the forward SSRFT. This slight change in format that doesn’t really affect the semantics of the SSRFT, and it makes it more compatible with other noise linops, which are typically also tall instead of fat. It is also more common to think about orthogonal columns than rows. To make it fat,
skerch.linops.TransposedLinOpcan still be used.Note
Unlike classes extending
skerch.linops.ByBlockLinOp, in this case it is not efficient to apply this operator by row/column. Instead, this implementation applies the SSRFT directly to the input, by vector, but it also implementsget_vectorvia one-hot vecmul to facilitate parallel measurements and fit the standard interface forskerchmeasurement linops.- REGISTER = {'default': [(12345, 12395)]}
skerch.recovery module
Functionality to recover linop approximations from their sketches.
- skerch.recovery.hmt(sketch_right, lop, as_svd=True)[source]
HMT sketched low-rank recovery.
The core idea behind this method is outlined in [HMT2009]: Given a linear operator \(A\), we assume that there exists a thin, orthogonal matrix \(Q\) such that \(A \approx Q Q^H A\). Crucially, it is possible to efficiently obtain \(Q\) from random measurements, or sketches, followed by QR orthogonalization.
Then, to produce an SVD, it remains to decompose the “thin” matrix \(B^H = P^H A = U \Sigma V^H\), yielding the final SVD: \(A \approx (P U) \Sigma V^H\).
- Parameters:
sketch_right – Sketches
A @ mop_right(typically a tall matrix).lop – Our target linear operator
A.
- Returns:
If
as_svd, a tripleU, S, Vhwith \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, \((Q, B^H)\).
- skerch.recovery.hmt_h(sketch_right, lop, as_eigh=True, by_mag=True)[source]
HMT sketched low-rank recovery (Hermitian).
Hermitian version of
hmt(). Since \(A = A^H\), we can further compact the core matrix into a Hermitian, “small” matrix: \(A \approx Q Q^H A Q Q^H \Rightarrow C = Q^H A Q\). Then we can obtain the eigendecomposition of \(C\).- Parameters:
by_mag – see
skerch.utils.eigh().- Returns:
If
as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).
- skerch.recovery.nystrom(sketch_right, sketch_left, mop_right, rcond=1e-06, as_svd=True)[source]
Generalized Nyström recovery of linop from left and right sketches.
Single-pass recovery from [Naka2020]. Here, we assume the low-rank approximation \(A \approx A \Omega C \Psi^H A\), with core matrix in the form \(C = (\Psi^H A \Omega)^\dagger\).
Given sketches \(Y = A \Omega\), \(W^H = \Psi^H A\), this method obtains the approximation without requiring any further measurements and by solving a single, compact least-squares system in the form \(A \approx Y (W^H \Omega)^\dagger W^H\).
To obtain the SVD approximation,
Y, Ware further orthogonalized.- Parameters:
sketch_right – Sketches
A @ mop_right(typically a tall matrix).sketch_left – Sketches
(mop_left @ A)(typically a fat matrix).mop_right – Linop used to obtain
sketch_right.rcond – Singular value threshold using during the least square solving process. See
skerch.utils.lstsq().
- Returns:
If
as_svd, the tripleU, S, Vhwith \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pairY, Bhwith \(A \approx Y B^H\).
- skerch.recovery.nystrom_h(sketch_right, mop_right, rcond=1e-06, as_eigh=True, by_mag=True)[source]
Generalized Nyström recovery of Hermitian linop from right sketch.
Hermitian version of
nystrom(). Since \(A = A^H\), we only need sketches from one side: \(A \approx A \Omega C \Omega^H A\), with Hermitian core matrix in the form \(C = (\Omega^H A \Omega)^\dagger\).- Parameters:
by_mag – see
skerch.utils.eigh().- Returns:
If
as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).
- skerch.recovery.oversampled(sketch_right, sketch_left, sketch_inner, lilop, rilop, rcond=1e-06, as_svd=True)[source]
Oversampled recovery of linop from left, right and inner sketches.
Single-pass, oversampled recovery from [BWZ2016]:
Assuming \(A \approx P (P^H A @ Q) @ Q^H = P C Q^H\), this method aims to obtain a core matrix \(C\) via an independent, oversampled sketch. The above approximation is satisfied by the following core matrix: \(C = (\Psi^H P)^\dagger \Psi^H A \Omega (Q^H \Omega)^\dagger\), and \(P, Q\) can be been obtained via left and right outer sketches and subsequent orthogonalization.
The key observation here is that \(C\) can be oversampled, i.e. the number of measurements taken by \(\Psi, \Omega\) can be larger than the number of columns in \(P, Q\). This helps conditioning the pseudoinverses and has been shown in [TYUC2018, 4.1] to yield better recoveries when the singular values of \(A\) decay slowly and the number of outer measurements doesn’t sufficiently cover for that.
The trade-off here is that this method requires the extra (independent) inner measurements, plus the extra least-squares steps involved in solving the two pseudoinverses. To return output as SVD, the core matrix is further diagonalized.
- Parameters:
sketch_right – Sketches
A @ mop_right(typically a tall matrix).sketch_left – Sketches
(mop_left @ A)(typically a fat matrix).sketch_inner – Sketches
(lilop^H @ A @ rilop)(typically a small, square matrix).lilop – Left inner linop used to obtain
sketch_inner.rilop – Right inner linop used to obtain
sketch_inner.rcond – Singular value threshold using during the least square solving process. See
skerch.utils.lstsq().
- Returns:
If
as_svd, the tripleU, S, Vhwith \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pairY, Bhwith \(A \approx Y B^H\).
- skerch.recovery.oversampled_h(sketch_right, sketch_inner, lilop, rilop, rcond=1e-06, as_eigh=True, by_mag=True)[source]
Oversampled recovery of linop from right and inner sketches.
Hermitian version of
oversampled(). Since \(A = A^H\), we only need outer sketches from one side. But part of the strength in the oversampled method is that the inner measurement matrices are uncorrelated. To allow for this possibility, this function still admits separate parameters forlilopandrilop.(see [FSMH2025] for more discussion).
- Parameters:
by_mag – see
skerch.utils.eigh().- Returns:
If
as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).
- skerch.recovery.singlepass(sketch_right, sketch_left, mop_right, rcond=1e-06, as_svd=True)[source]
Single-pass recovery of linop from left and right sketches.
Single-pass recovery from [TYUC2018, 4.1]: Assuming \(A \approx A Q Q^H\), and given
Y, W.H, Omega, where \(Y = A \Omega\) and \(W^H = \Psi^H A\) are our random sketches, we can recoverAwithout any further measurements, thus being a single-pass method.First, we obtain
Qvia QR decomposition ofW. Then, it holds\[\begin{split}\begin{align} Y (Q^H \Omega)^{-1} Q^H &= (A \Omega) (Q^H \Omega)^{-1} Q^H\\ &\approx (A Q Q^H \Omega) (Q^H \Omega)^{-1} Q^H \\ &= (A Q) Q^H \approx A \\ \end{align}\end{split}\]Thus, we just need to solve a well-conditioned least-squares problem to approximate
A. To obtain the full SVD, we further need to compute the SVD ofY @ pinv(Q.H @ Omega)and recombine.- Parameters:
sketch_right – Sketches
A @ mop_right(typically a tall matrix).sketch_left – Sketches
(mop_left @ A)(typically a fat matrix).mop_right – Linop used to obtain
sketch_right.rcond – Singular value threshold using during the least square solving process. See
skerch.utils.lstsq().
- Returns:
If
as_svd, the tripleU, S, Vhwith \(A \approx U diag(S) V^H\) being a thin SVD. Otherwise, the pairY, Bhwith \(A \approx Y B^H\).
- skerch.recovery.singlepass_h(sketch_right, mop_right, rcond=1e-06, as_eigh=True, by_mag=True)[source]
Single-pass recovery of Hermitian linop from right sketch.
Hermitian version of
singlepass(). Since \(A = A^H\), we only need sketches from one side, because in the Hermitian caseQcan also be obtained by orthogonalizingsketch_right, and the method remains unchanged.- Parameters:
by_mag – see
skerch.utils.eigh().- Returns:
If
as_eigh, the pair \(\Lambda, X\) corresponding to the eigendecomposition \(A \approx X diag(\Lambda) X^H\). Otherwise, the pair \(C, Q\) (see derivation above).
skerch.synthmat module
Utilities for synthetic (random) matrices.
This module allows us to sample random (synthetic) matrices, following a variety of structures, such as approximately low-rank plus diagonal.
- class skerch.synthmat.RandomLordMatrix[source]
Bases:
objectStatic class to sample random lowrank + diagonal matrices.
Inspired by the following publications:
This class allows to sample random matrices with the following structures:
Low rank + noise: dampened Gaussian noise matrix in the form
G @ G.H, plus a (low-rank) subset of the identity matrix.Exponential: Singular values have a low-rank unit section followed by exponential decay
Polynomial: Singular values have a low-rank unit section followed by polynomial decay
Optionally, a random diagonal can be added to all the above. Matrices can also be Hermitian and non-PSD.
- classmethod exp(shape=(100, 100), rank=10, decay=0.5, diag_ratio=0.0, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]
Random matrix with exponential singular value decay plus diagonal.
Like
poly(), but the singular value decay of the low-rank component follows \(10^{-d}, 10^{-2d}, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.01 for slow decay, 0.1 for medium, 0.5 for fast.
- static get_decay_svals(dims, rank, decay_type, decay, dtype, device)[source]
Singular values with a particular decay pattern.
- static mix_matrix_and_diag(mat, diag, diag_ratio=1.0, inplace=True)[source]
Adding a (normalized) diagonal to a matrix.
- Parameters:
mat – Nonzero matrix of shape
(h, w).diag – Vector of shape
min(h, w).diag_ratio – Nonnegative scalar indicating the relative strength of the additive diagonal component. A ratio of 0 means no diagonal is added. A ratio of 1 means the diagonal is normalized to have the average row norm (resp. column, whichever dimension is smaller).
inplace – If true,
diagwill be scaled to the given ratio in-place, andmatwill be added the scaled diagonal also in-place. Otherwise, copies will be returned.
- Returns:
The triple
(mat + ratio * diag, ratio * diag, ratio).
- classmethod noise(shape=(100, 100), rank=10, snr=0.0001, diag_ratio=0.0, seed=479915, dtype=torch.float64, device='cpu')[source]
Low rank and symmetric noise plus a real-valued noisy diagonal.
If diag_ratio is 0, produces a lowrank+noise matrix, as follows: First, A Gaussian IID noise matrix
Gis sampled for the provided (square) shape. Then, the noisy matrix(snr / dims) * G@G.His computed. Finally, the firstrankdiagonal entries are incremented by 1. As a result, we have a symmetric (resp. Hermitian) noisy matrix with strong diagonal for the firstrankentries, and with a signal-to-noise ratio ofsnr.If
diag_ratio != 0, a real-valued Gaussian IID diagonal is added viamix_matrix_and_diag().- Parameters:
rank (int) – How many diagonal entries should be incremented by 1.
snr (float) –
Signal-to-noise ratio for the symmetric noise. In [TYUC2019, 7.3.1] the following values are used: 1e-4 for low noise, 1e-2 for mid noise, and 1e-1 for high noise.
- Returns:
The pair
(mat + diag, diag)wherematis (approximately) low-rank following the above recipe.
- classmethod poly(shape=(100, 100), rank=10, decay=2.0, diag_ratio=0.0, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]
Random matrix with polynomial singular value decay plus diagonal.
Produces a matrix in the form
U @ S @ V.H + D, whereUandVare random orthogonal matrices,Shas entries with polynomially decaying magnitude, analogous to the ones described in [TYUC2019, 7.3.1], andDhas random Gaussian IID entries with intensity given bydiag_ratio: Ifdiag_ratio != 0, a real-valued Gaussian IID diagonal is added viamix_matrix_and_diag().- Parameters:
rank (int) – Entries in
S[:rank]will have a magnitude of 1.decay (float) –
Parameter determining how quickly magnitudes in
S[rank:]decay to zero, following \(2^d, 3^d, 4^d, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.5 for slow decay, 1 for medium, 2 for fast.diag_ratio – Intensity of
D. Seemix_matrix_and_diag()for details.symmetric (bool) – If true,
U == V.psd – If false, and matrix is symmetric, the singular values will be randomly sign-flipped to create a non-PSD matrix.
- Returns:
The pair
(mat + diag, diag)wherematis (approximately) low-rank following the above recipe.
skerch.utils module
General utilities for the skerch library.
- exception skerch.utils.BadSeedError[source]
Bases:
ExceptionError to be thrown when a random seed is not as it should.
- exception skerch.utils.BadShapeError[source]
Bases:
ExceptionError to be thrown when a shape is not as it should.
- skerch.utils.complex_dtype_to_real(dtype)[source]
Given a complex datatype, returns its real counterpart.
- Parameters:
dtype – Complex torch dtype, e.g.
torch.complex128.- Returns:
Float torch dtype, e.g.
torch.float64.
- skerch.utils.eigh(A, by_descending_magnitude=True)[source]
Hermitian Eigendecomposition.
- Parameters:
by_descending_magnitude – If true, eigenpairs are given by descending magnitude of eigenvalues (e.g.
-4, 3, 0.1, -0.001, 0). If false, eigenpairs are given by descending value (3, 0.1, 0, -0.001, -4).- Returns:
The eigendecomposition
(Lambda, Q)such thatA = Q @ diag(Lambda) @ Q.H.
- skerch.utils.gaussian_noise(shape, mean=0.0, std=1.0, seed=None, dtype=torch.float64, device='cpu')[source]
Reproducible
torch.normalGaussian noise.- Returns:
A tensor of given shape, dtype and device, containing gaussian noise with given mean and std (analogous to
torch.normal), but with reproducible behaviour fixed to given random seed and device.
Warning
PyTorch does not ensure RNG reproducibility across devices. The
deviceparameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.
- skerch.utils.htr(x, in_place=False)[source]
Hermitian transposition wrapper.
This convenience wrapper exists for several reasons:
While torch supports
.H, numpy does not.In multiprocessing settings,
.conj()seems to sometimes not work, which is likely related to in_place/view/copy behaviour.Transposition of vectors via .T throws a warning since it is a no-op.
This function avoids all three issues, by returning the input as a conjugate, and also transposed if it is a matrix.
- Parameters:
x – Numpy or Torch object, expected to be a vector or matrix (undefined behaviour otherwise).
in_place – If true, the imaginary entries are flipped in-place. Otherwise, a new copy of the input is always returned. No in-between View behaviour is possible (thus this function may be suboptimal in some circumstances, but avoids multiprocessing issues).
- Returns:
The Hermitian transpose of
x(if matrix), or the compex conjugate if vector. Undefined otherwise.
- skerch.utils.lstsq(A, b, rcond=1e-06)[source]
Least-squares solver.
PyTorch/NumPy/HDF5 wrapper.
- Parameters:
rcond – Cutoff value, any singular values of
Asmaller thanrcond * largest_singular_valueare considered zero.- Returns:
xsuch thatfrob(Ax - b)is minimized.
- skerch.utils.phase_noise(shape, seed=None, dtype=torch.complex128, device='cpu', conj=False)[source]
Reproducible noise uniformly distributed on the complex unit circle.
Note
This function makes use of
uniform_noise()to sample the phase noise. Ifxitself has been generated usinguniform_noise, make sure to use a different seed to mitigate correlations.Warning
PyTorch does not ensure RNG reproducibility across devices. The
deviceparameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.- Parameters:
conj – If true, the generated noise is the complex conjugate of the noise if all other parameters were equal.
- Returns:
A tensor of given shape, dtype and device, containing complex i.i.d. noisy entries uniformly distributed around the unit circle.
- skerch.utils.phase_shift(x, seed=None, inplace=True, rng_device='cpu', conj=False)[source]
Reproducible phase shift using phase noise.
Like
rademacher_flip(), but applying the complex counterpart of Rademacher noise, i.e. multiplies with phase noise on the complex unit circle.See also
- skerch.utils.pinv(A)[source]
Pseudo-inversion of a given matrix.
PyTorch/NumPy/HDF5 wrapper.
- Parameters:
A – matrix to pseudo-invert, of shape
(h, w). It needs to be compatible with eitherscipy.linalg.pinvortorch.linalg.qr.- Returns:
Pseudoinverse of
Awith shape(w, h).
- skerch.utils.qr(A, in_place_q=False, return_R=False)[source]
Thin QR-decomposition of given matrix.
PyTorch/NumPy/HDF5 wrapper.
- Parameters:
A – Matrix to orthogonalize, needs to be compatible with either
scipy.linalg.qrortorch.linalg.qr. It must be square or tall.in_place_q – If true,
A[:] = Qwill be performed.
- Returns:
If
return_Ris true, returns(Q, R)such thatQhas orthonormal columns,Ris upper triangular andA = Q @ Ras per the QR decomposition. Otherwise, returns justQ.
- skerch.utils.rademacher_flip(x, seed=None, inplace=True, rng_device='cpu')[source]
Reproducible random sign flip using Rademacher noise.
Variation of
rademacher_noise(), wherexcan be any tensor of any type, and it gets multiplied with Rademacher noise.- Parameters:
inplace – Whether the input tensor gets multiplied in-place.
- Returns:
The pair
(x * rademacher_mask, rademacher_mask).
See also
rademacher_noise()for notes on reproducibility and more info.
- skerch.utils.rademacher_noise(shape, seed=None, device='cpu')[source]
Reproducible Rademacher noise.
Note
This function makes use of
uniform_noise()to sample the Rademacher noise. Ifxitself has been generated usinguniform_noise, make sure to use a different seed to mitigate correlations.Warning
PyTorch does not ensure RNG reproducibility across devices. The
deviceparameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.- Parameters:
shape – Shape of the output tensor with Rademacher noise.
seed – Seed for the randomness.
device – Device of the output tensor and also source for the noise.
- skerch.utils.randperm(n, seed=None, device='cpu', inverse=False)[source]
Reproducible randperm of
nintegers from 0 to (n-1), both included.- Parameters:
inverse (bool) – If False, a random permutation
Pis provided. If true, an inverse permutationQis provided, such that both permutations are inverse to each other, i.e.v == v[P][Q] == v[Q][P].
Warning
PyTorch does not ensure RNG reproducibility across devices. The
deviceparameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.
- skerch.utils.serrated_hadamard_pattern(v, blocksize, with_main_diagonal=True, lower=True, use_fft=False)[source]
Shifted copies of vectors for block-triangular Girard-Hutchinson.
- Parameters:
v – Torch tensor expected to contain zero-mean, uncorrelated entries. If vector, shift is applied directly. If tensor, shift is applied to last dimension.
with_main_diagonal – If true, the main diagonal will be included in the patterns, otherwise excluded.
lower – If true, the block-triangles will be below the diagonal, otherwise above.
use_fft – See
subdiag_hadamard_pattern().
- Returns:
A vector of same shape, type and device as
v, composed of shifted copies ofvfollowing a block-triangular (serrated) pattern.
The shifted expectation mechanism introduced in
subdiag_hadamard_pattern()can be modified to yield expectations of triangular segments around any diagonal. This function performs the shifts corresponding to triangles of a given block size.For example, given a 10-dimensional vector, the corresponding serrated pattern with
blocksize=3, with_main_diagonal=True, lower=Trueyields the following entries:v1v1 + v2v1 + v2 + v3v4v4 + v5v4 + v5 + v6v7v7 + v8v7 + v8 + v9v10
If main diagonal is excluded, it will look like this instead:
0v1v1 + v20v4v4 + v50v7v7 + v80
And if
lower=False, it will look like this instead:v1 + v2 + v3v2 + v3v3v4 + v5 + v6v5 + v6v6v7 + v8 + v9v8 + v9v9v10
- skerch.utils.subdiag_hadamard_pattern(v, diag_idxs, use_fft=False)[source]
Shifted copies of vectors for subdiagonal Hutchinson estimation.
Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^H] = I\), consider this generalized formulation of the Girard-Hutchinson diagonal estimator:
\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(\bar{v}) \odot Av \big]\]If the \(\varphi\) function is the identity, then \(f(A)\) equals the main diagonal of \(A\). If e.g. \(\varphi\) shifts the entries in \(v\) downwards by
kpositions, we get thek-th subdiagonal.This function performs the corresponding (non-circular) shift.
See also
[BN2022]: Robert A. Baston and Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.
- Parameters:
v – Torch tensor expected to contain zero-mean, uncorrelated entries. If vector, shift is applied directly. If tensor, shift is applied to last dimension.
diag_idxs – Iterator with integers corresponding to the subdiagonal indices to include, e.g. 0 corresponds to the main diagonal, 1 to the diagonal above, -1 to the diagonal below, and so on.
use_fft – If false, shifted copies of
vare pasted on the result. This requires onlylen(v)memory, but haslen(v) * len(diag_idxs)time complexity. If this argument is true, an FFT convolution is used instead. This requires at least4 * len(v)memory, but the arithmetic has a complexity oflen(v) * log(len(v)), which can be advantageous wheneverlen(diag_idxs)becomes very large.
- Returns:
A vector of same shape, type and device as
v, composed of shifted copies ofvas given bydiag_idxs.
- skerch.utils.svd(A)[source]
Singular Value Decomposition.
PyTorch/NumPy/HDF5 wrapper.
- Returns:
The SVD
(U, S, Vh)such thatA = U @ diag(S) @ Vh.
- skerch.utils.torch_dtype_as_str(dtype)[source]
Torch dtype to string.
Given a PyTorch datatype object, like
torch.float32, returns the corresponding string, in this case ‘float32’.
- skerch.utils.truncate_decomp(k, U=None, S=None, Vh=None, copy=False)[source]
Truncation of diagonal decomposition.
- Parameters:
copy – If true, returns a hard copy of the truncation, otherwise returns a view of the given objects.
- Returns:
For any given
U, S, Vh, returns (respectively)U[:, :k], S[:k], Vh[:k, :], i.e. thekparameter represent the number of dimensions that we wish to keep.
If
U, S, Vhis the SVD or eigendecomposition of a matrix, and theSvalues are in non-ascending magnitude, this truncation returns the top-k components.
- skerch.utils.uniform_noise(shape, seed=None, dtype=torch.float64, device='cpu')[source]
Reproducible
torch.randuniform noise.- Returns:
A tensor of given shape, dtype and device, containing uniform random noise between 0 and 1 (analogous to
torch.rand), but with reproducible behaviour fixed to given random seed and device.
Warning
PyTorch does not ensure RNG reproducibility across devices. The
deviceparameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.