skerch package
skerch
: Sketched matrix decompositions for PyTorch.
This implementation is based on the following publications, referenced throughout the documentation:
[TYUC2019]: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2019. “Streaming Low-rank Matrix Approximation with an Application to Scientific Simulation”. SIAM Journal on Scientific Computing 41 (4): A2430–63.
[BN2022]: Robert A. Baston and Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.
skerch Subpackages
skerch Submodules
skerch.a_posteriori module
A-posteriori utilities for sketched decompositions.
Given a (potentially matrix-free) linear operator, and our obtained low-rank decomposition, we would like to know how similar they are. In [TYUC2019], a method is presented to estimate the Frobenius residual error. Furthermore, a probabilistic bound to said estimator is presented, providing confidence bounds for the estimation. More measures increase said confidence.
Finally, a “scree plot” method is also introduced, providing upper and lower bounds to estimate the resudual Frobenius error as a function of the rank. This can be used to estimate the actual rank of our original linear operator, and to choose the rank of our recovery.
This module implements this functionality.
- skerch.a_posteriori.a_posteriori_error(lop1, lop2, num_measurements, seed=479915, dtype=torch.float64, device='cpu', adjoint=False)[source]
A-posteriori sketch error estimation.
This function implements the error estimation procedure presented in [TYUC2019, 6]. For that, it performs
num_measurements
gaussian vector multiplications on linear operatorslop1
andlop2
, and then compares the results. The \(\ell_2\) error between the obtained measurements is then an estimation of the Frobenius error betweenlop1
andlop2
.- Parameters:
lop1 – Linear operator with a
shape=(h, w)
attribute and implementing a left (x @ mat
) or right(mat @ x)
matmul op. It is assumed to be compatible with the given torchdtype
.lop2 – See
lop1
.adjoint – If true, left-matmul is used, otherwise right-matmul.
- Returns:
A tuple
((f1_mean, f2_mean, diff_mean), (f1, f2, diff))
, where the last 3 elements are lists with theL_2^2
norms of each random measurement for the first matrix (f1), second matrix (f2) and difference between matrices (diff). The first 3 elements are the averages over all given measurements. The final estimate for the error betweenlop1
andlop2
is thendiff_mean
. To get confidence bounds on this estimate, usea_posteriori_error_bounds
.
- skerch.a_posteriori.a_posteriori_error_bounds(num_measurements, rel_err)[source]
Probabilistic bounds for a-posteriori sketch error estimation.
Retrieves probabilistic error bounds presented in [TYUC2019, 6.4], see module docstring for details. Note that this function does not perform the error estimation, it merely informs about how noisy would be the estimation given the desired number of measurements.
- Parameters:
num_measurements (int) – How many Gaussian measurements will be performed for the a-posteriori error estimation. More measurements means tighter error bounds.
rel_err (float) – A float between 0 and 1, indicating the relative error that we want to consider. If
x
is the actual error we want to estimate, andy
is our estimation, this function will return the probability thaty
is in the range[(1-rel_err)x, (1+rel_err)x]
.
- Returns:
Probabilities that, given the indicated number of measurements, the a-posteriori method yields values outside of
actual_error * (1 +/- rel_err)
. Ideally, we want a sufficient number of measurements, such that the returned probabilities for smallrel_err
are themselves small (this means that the corresponding error estimation is tight).
- skerch.a_posteriori.scree_bounds(S, ori_frob, err_frob)[source]
A-posteriori scree bounds for low-rank approximations.
Reference: [TYUC2019, 6.5].
When we do a sketched eigen- or singular-decomposition of a linear operator, we don’t generally know a-priori where is the effective rank of said operator. A scree analysis is an a-posteriori tool to estimate the smallest rank that captures a sufficient amount of the actual matrix.
It does so by efficiently retrieving upper and lower bounds for the actual proportion of residual energy that remains, as we increase the rank of our estimated matrix. Typically we look for an initial sharp decay, followed by an “elbow” at a low point, after which residual energy stops decaying fast. The location of the elbow is a good indicator for the effective rank, but quantitative methods are also possible (see [TYUC2019]).
- Parameters:
S – Vector of the estimated spectrum/singular values, expected to contain entries in non-ascending magnitude.
ori_frob – Estimate (or exact if available)
frob_norm(M)
, whereM
is the original linop that we are decomposing. This quantity is estimated bya_posteriori_error()
, but note that it returnsfrob_norm^2
, while this function requires the Frobenius norm itself.err_frob – A-posteriori estimate of
frob_norm(M - M')
, whereM'
is the recovery ofM
with singular valuesS
.
Usage example:
f1_pow2, f2_pow2, res_pow2 = a_posteriori_error(M, M', ...) S = get_singular_values(M') # in non-ascending magnitude scree_lo, scree_hi = scree_bounds(S, f1_pow2**0.5, res_pow2**0.5) # plot the scree bounds to find an elbow as a cutting point for rank(M).
skerch.a_priori module
A-priori utilities for sketched decompositions.
Given a limited amount of memory, we would like to know how many inner and outer measurements we should perform to get the best results.
In [TYUC2019, 5.4.2], it is argued that the number of outer measurements should be maximized within memory constraints, subject to the number of inner measurements being twice. This yields a closed-form optimum for general matrices, presented in the paper and implemented here.
- skerch.a_priori.a_priori_hyperparams(matrix_shape, memory_budget, complex_data=False, hermitian=False)[source]
A-priori hyperparameter selection for sketched measurements.
Given the shape of a generic linear operator that we wish to decompose, and a memory budget, return the number of inner and outer measurements required, such that the budget is not exceeded,
inner >= 2 * outer
, and outer is maximized (see [TYUC2019, 5.4.2]).- Parameters:
memory_budget (int) – Memory available in number of matrix entries. E.g. if the matrix is in float32, a budget of
N
means4N
bytes.hermitian – If true, the matrix is assumed to be hermitian (and square), so measurements only need to be done on one side. This means that, for the same budget, we can do more measurements.
- Returns:
The pair
(k, s)
, where the first integer is the optimal number of outer sketch measurements, and the second one is the corresponding number of core measurements. Optimality is defined as: distribute the budget such that k is maximized, buts >= 2k
(see 5.4.2. in paper).
skerch.decompositions module
In-core, single-process sketched decompositions.
Note
Often, when the decomposition is small enough to fit into a single process
and core, the traditional and exact algorithms may be preferred.
For the cases that do not fit, see distributed_decompositions
.
- skerch.decompositions.seigh(op, op_device, op_dtype, outer_dim, inner_dim, seed=479915)[source]
Sketched Hermitian Eigendecomposition (EIGH).
Sketched EIGH of any given linear operator, modified from the SSVD from [TYUC2019, 2]. It leverages Hermitian symmetry of the operator to require half the measurements, memory and arithmetic. The operator must be square and Hermitian, but doesn’t have to be PSD.
- Parameters:
op – Hermitian linear operator that implements the left- and right matmul operators via
@
, as well as theshape=(d, d)
attribute, but it doesn’t need to be in explicit matrix form.outer_dim (int) – Number of outer measurements to be performed.
inner_dim (int) – Number of inner measurements to be performed. It should be larger than
outer_dim
, typically twice is reccommended.
- Returns:
(Q, U, S)
, soQ @ U @ diag(S) @ U.T @ Q.T
is a low-rank approximation ofop
. This is an EIGH sinceS
contains the eigenvalues (in descending magnitude), andQ, U
contain orthogonal columns withQ.shape = (h, outer_dim)
, andU.shape = (outer_dim, outer_dim)
.
- skerch.decompositions.ssvd(op, op_device, op_dtype, outer_dim, inner_dim, seed=479915)[source]
Sketched Singular Value Decomposition (SVD).
Sketched SVD of any given linear operator, as introduced in [TYUC2019, 2]. As with any SVD, the operator does not have to be square, symmetric, or PSD.
- Parameters:
op – Linear operator that implements the left- and right matmul operators via
@
, as well as theshape=(h, w)
attribute, but it doesn’t need to be in explicit matrix form.outer_dim (int) – Number of outer measurements to be performed.
inner_dim (int) – Number of inner measurements to be performed. It should be larger than
outer_dim
, typically twice is reccommended.
- Returns:
(Q, U, S, V.T, P.T)
, soQ @ U @ diag(S) @ V.T @ P.T
is a low-rank approximation ofop
. This is an SVD sinceS
contains the non-negative singular values (in non-ascending order), andQ, U
contain orthogonal columns withQ.shape = (h, outer_dim)
, andU.shape = (outer_dim, outer_dim)
.
- skerch.decompositions.truncate_core(k, core_U, core_S, core_Vt=None)[source]
Truncation of core sketches.
Truncates core matrices as returned by
ssvd()
orseigh()
down to rankk
. In SEIGH,core_Vt
is the same ascore_U.T
, so it can be omitted.- Parameters:
k (int) – Truncation rank of the output core elements.
core_U – Left core basis matrix of shape
(outer_dim, outer_dim)
.core_S – Core singular values of shape
(outer_dim,)
.core_Vt – If given, right core matrix analogous to
core_U
.
- Returns:
Truncated
(core_U, core_S, core_Vt)
such thatcore_S
retains the firstk
entries, and the core matrices are truncated accordingly. Ifcore_Vt
is none, only the first two elements are returned.
skerch.distributed_decompositions module
Sketched decompositions on distributed systems.
For in-core decompositions, see distributed_decompositions
.
- skerch.distributed_decompositions.core_pinv_solve(Q, randlop, target)[source]
Pseudo-inverse step as part of solving the core sketch matrix.
The core matrix can be expressed in terms of
pinv(randlop @ Q) @ target
operations (see [TYUC2019, 2.7]). Given thetarget
operator, this expression can be computed directly via least squares without having to explicitly compute thepinv
pseudo-inverse, which is typically faster and more stable.This function is used as a sub-routine in
solve_core_ssvd()
andsolve_core_seigh()
.- Parameters:
Q – Matrix of shape
(h, outer_measurements)
containing either the left-outer or right-outer measurements.randlop – Linear operator used to perform the inner measurements. If
Q
contains right-outer measurements, this must be the left-inner operator, and vice versa. Expected shape is(num_inner, h)
.
- Returns:
matrix equivalent to
pinv(randlop @ Q) @ target
- skerch.distributed_decompositions.create_hdf5_layout(dirpath, lop_shape, lop_dtype, num_outer_measurements, num_inner_measurements, lo_fmt='leftouter_{}.h5', ro_fmt='rightouter_{}.h5', inner_fmt='inner_{}.h5', with_ro=True)[source]
Creation of persistent HDF5 files to store sketches of a linear operator.
- Parameters:
dirpath (str) – Where to store the HDF5 files.
lop_shape – Shape of linear operator to sketch from, in the form
(height, width)
.lop_dtype – Torch dtype of the operator, e.g.
torch.float32
. The HDF5 arrays will be of same type.num_outer_measurements (int) – Left outer measurement layout contains
(width, outer)
entries, and right outer layout(height, outer)
.num_inner_measurements (int) – Inner measurement layout contains
(inner, inner)
entries.with_ro – If false, no right outer layout will be created (useful when working with symmetric matrices where only one side is needed).
- Lo_fmt:
Format string for the left-outer HDF5 filenames.
- Ro_fmt:
Format string for the right-outer HDF5 filenames.
- Inner_fmt:
Format string for the inner HDF5 filenames.
- skerch.distributed_decompositions.orthogonalize(matrix, overwrite=False)[source]
Orthogonalization of given matrix.
- Parameters:
matrix – matrix to orthogonalize, needs to be compatible with either
scipy.linalg.qr
ortorch.linalg.qr
.overwrite – If true,
matrix[:] = Q
will also be performed.
- Returns:
Orthogonal matrix
Q
such thatmatrix = Q @ R
as per the QR decomposition.
- skerch.distributed_decompositions.solve_core_seigh(Q, inner, li_randlop, ri_randlop)[source]
Solving the core sketch matrix for sketched Hermitian eigendecomposition.
Given the orthogonalized outer measurements, as well as the inner measurements and the linear operators used to perform said inner measurements in the context of a sketched EIGH, this function solves the core matrix, following […].
Note
This implementation is compatible with torch tensors, numpy arrays and HDF5 arrays.
- Parameters:
Q – outer measurement matrix (side doesn’t matter since matrix is assumed to be Hermitian). Expected to be in thin vertical shape, i.e.
(lop_dim, num_outer_measurements)
. Also expected to be orthogonal, i.e.Q.T @ Q = I
.inner – Inner measurement matrix of shape
(num_inner, num_inner)
.
- Parma li_randlop:
Torch linear operator used to perform inner measurements via
li_randlop @ lop @ ri_randlop.T
. Expected shape is(inner_measurements, lop_height)
.- Parma ri_randlop:
see
li_randlop
. Expected shape is(inner_measurements, lop_height)
.
- skerch.distributed_decompositions.solve_core_ssvd(ro_Q, lo_Q, inner_measurements, li_randlop, ri_randlop)[source]
Solving the core sketch matrix for sketched SVD.
Given the orthogonalized right- and left-outer measurements, as well as the inner measurements and the linear operators used to perform said random measurements in the context of a sketched SVD, this function solves the core matrix, following [TYUC2019, 2.7].
Note
This implementation is compatible with torch tensors, numpy arrays and HDF5 arrays.
- Parameters:
ro_Q – Right-outer measurement matrix. Expected to be in thin vertical shape, i.e.
(lop_height, num_outer_measurements)
. Also expected to be orthogonal, i.e.ro_Q.T @ ro_Q = I
.lo_Q – Left-outer measurement matrix. Expected to be in thin vertical shape, i.e.
(lop_width, num_outer_measurements)
. Also expected to be orthogonal, i.e.lo_Q.T @ lo_Q = I
.inner_measurements – Inner measurement matrix of shape
(num_inner, num_inner)
.
- Parma li_randlop:
Torch linear operator used to perform inner measurements via
li_randlop @ lop @ ri_randlop.T
. Expected shape is(inner_measurements, lop_height)
.- Parma ri_randlop:
see
li_randlop
. Expected shape is(inner_measurements, lop_height)
.
skerch.distributed_measurements module
Utilities for distributed sketch measurements.
One typically expensive step when we want to do sketched decompositions is taking measurements from the linear operator we want to decompose, either in terms of runtime, memory, or both. This module provides functionality to parallelize and distribute said measurements.
- class skerch.distributed_measurements.DistributedHDF5[source]
Bases:
object
Class to manage HDF5 files holding distributed sketch measurements.
In general, multiple processes are not allowed to concurrently open and write to HDF5 files. In order to allow for distributed, but coherent writing, this class allows to create many individual HDF5 files, and then ‘virtually’ merge them into a single coherent dataset.
Unfortunately, most OS don’t allow a single process to open many files at once, and each sub-file here counts as one. As a result, loading such distributed HDF5 datasets will reach a breaking point, after which contents are not really being opened, and reported as “empty” (e.g. full of zeros). For those cases, this class also provides a
merge_all
method, to efficiently merge the distributed dataset into a monolithic one.In the virtual mode, all files are created in the same directory following a consistent naming convention, and this is expected to remain unchanged. Since this class is naturally intended for multiple concurrent processes/devices, it is designed in the form of a static class.
Usage example (see
distributed_decompositions
for more examples):# create the empty separate HDF5 files on disk, and the "virtual" merger out_path = "/tmp/my_dataset_{}.h5" each_shape, num_files = (1000,), 5 h5_path, h5_subpaths = DistributedHDF5.create( out_path, num_files, each_shape, torch_dtype_as_str(op_dtype) ) # in a (possibly distributed & parallelized) loop, load and write parts for i in range(num_files): vals, flag, h5 = DistributedHDF5.load(h5_subpaths[i]) vals[:] += 1 flag[0] = 'done' h5.close() # remember to close file handles when done! # merged data can be used as a (1000, 50) matrix by a separate machine all_data, all_flags, all_h5 = DistributedHDF5.load_virtual(h5_path) print(scipy.diag(all_data)) print(all_flags[::2]) all_h5.close() # convert virtual database into single monolithic one DistributedHDF5.merge_all( h5_path, delete_subfiles_while_merging=True, ) all_data, all_flags, all_h5 = DistributedHDF5.load_virtual(h5_path) ...
- DATA_NAME = 'data'
- FLAG_DTYPE = dtype('O')
- FLAG_NAME = 'flags'
- INITIAL_FLAG = 'initialized'
- MAIN_PATH = 'ALL'
- SUBPATHS_FORMAT = '{0:010d}'
- classmethod analyze_virtual(all_path, check_success_flag=None)[source]
Analyze shapes and relations of files created via
create()
.Extract relevant information from a given virtual HDF5 dataset, which can be used e.g. to allow merging it into a monolithic one (see implementation of
merge_all()
for an example).- Parameters:
all_path – The
all_path
of a virtual HDF5 dataset like the ones created viacreate()
.check_success_flag – If given, this method will check that all HDF5 flags equal this value, raise an
AssertionError
otherwise.
- Returns:
the tuple
(data_shape, data_dtype, data_subshapes, data_map, flags_shape, flags_dtype, flag_subshapes, flag_map, filedim_idx)
.
- classmethod create(base_path, num_files, shape, dtype, compression='lzf', filedim_last=True)[source]
Create a distributed HDF5 measurement dataset.
- Parameters:
base_path (str) – Format string with the path to store created HFG5 files, in the form
<DIR>/my_dataset_{}.h5
.num_files – Number of HDF5 files to be created, each corresponding to one measurement.
shape – Shape of the array inside each individual file. For linear measurements, this is a vector, e.g.
(1000,)
.dtype – Datatype of the HDF5 arrays to be created.
filedim_last – If true, the virtual dataset result of merging all files will be of shape
shape + (num_files,)
. Otherwise,(num_files,) + shape
.
- Returns:
The pair
(all_path, subpaths)
, whereall_path
is the path of the virtual dataset encompassing all subpaths, which correspond to the individual files.
- classmethod load(path, filemode='r+')[source]
Load an individual dataset created via
create()
.- Parameters:
path – One of the subpaths returned by
create()
.filemode – Default is ‘r+’, read/write, file must preexist. See documentation of
h5py.File
for more details.
- Returns:
(data, flag, h5f)
, wheredata
is the dataset for the numerical measurements,flag
is the dataset for state tracking, andh5f
is the (open) HDF5 file handle.
Note
Remember to
h5f.close()
once done with this file.
- classmethod load_virtual(all_path, filemode='r+')[source]
Load a merged dataset created via
create()
.- Parameters:
path – The
all_path
returned bycreate()
.filemode – Default is ‘r+’, read/write, file must preexist. See documentation of
h5py.File
for more details.
- Returns:
(data, flag, h5f)
, wheredata
is the dataset for the numerical measurements,flag
is the dataset for state tracking, andh5f
is the (open) HDF5 file handle.
Note
Remember to
h5f.close()
once done with this file.
- classmethod merge_all(all_path, out_path=None, compression='lzf', check_success_flag=None, delete_subfiles_while_merging=False)[source]
Merges distributed HDF5 dataset into a single, monolithic HDF5 file.
- Parameters:
all_path – The
all_path
of a virtual HDF5 dataset like the ones created viacreate()
.out_path – If None, merged dataset will be written over the given
all_path
. Otherwise, path of the resulting HDF5 monolithic file.check_success_flag – If given, this method will check that all HDF5 flags equal this value, raise an
AssertionError
otherwise.delete_subfiles_while_merging (bool) – If true, each distributed HDF5 file that is visited will be deleted right after it is merged onto the monolithic HDF5 file. Useful to avoid large memory overhead.
- Returns:
out_path
.
- skerch.distributed_measurements.innermeas_idx_torch(idx, num_measurements, lop, lop_device, lop_dtype, out, left_seed, right_seed, flag=None, processing_flag='0', success_flag='1')[source]
Distributed-able SSRFT inner measurement for sketched decompositions.
Note
See
ssrft
for more details on the SSRFT and its implementation.Like
ssrft_idx_torch()
, but it applies the SSRFT on both sides oflop
, i.e. it computes:ssrft_l @ lop @ ssrft_r[idx, :].T
, wheressrft_l
hasleft_seed
and shape(num_measurements, lop.h)
, andssrft_r
hasright_seed
and shape(num_measurements, lop.w)
.- Parameters:
out – Array where the results will be writen in-place, of shape
(num_measurements, )
. It can be atorch
tensor or anumpy
or aHDF5
array, accepting in-place writing viaout[:] = ...
.flag – See
ssrft_idx_torch()
.
- Returns:
The in-between result
lop @ ssrft_r[idx, :].T
, as a torch tensor of shape(lop_height,)
. It will be on CUDA device iflop_device
is CUDA, otherwise CPU.
- skerch.distributed_measurements.linmeas_idx_torch(idx, lop, meas_lop, lop_device, lop_dtype, adjoint=False)[source]
Distributed-able linear measurement from given linear operators.
Convenience function to perform a single linear measurement from two (potentially matrix-free) linear operators. Specifically, it picks the
[idx]
row ofmeas_lop
, and applies it tolop
.- Parameters:
lop – Linear operator to measure from, typically matrix-free or very costly to store or compute. Assumed to implement right- and left matrix multiplication via the
@
operator, and also theshape = (h, w)
attribute, but no transposition.meas_lop – Linear operator to be applied to
lop
. It must also support@
operator andshape
attribute, but not transposition is required. For this reason, its shape is(*, h)
if adjoint is true, and(*, w)
otherwise, where*
is arbitrary.
- Returns:
If adjoint is True,
meas_lop[idx, :] @ lop
. Otherwiselop @ meas_lop[idx, :].T
.
- skerch.distributed_measurements.ssrft_idx_torch(idx, total_measurements, lop, lop_device, lop_dtype, out_vals, seed=479915, flag=None, processing_flag='-1', success_flag='1', adjoint=False)[source]
Distributed-able SSRFT measurement from given linear operators.
Note
See
ssrft
for more details on the SSRFT and its implementation.In-place SSRFT measurement, it projects the
idx
row of a SSRFT operator onto the givenlop
, writing results toout_vals
. If adjoint is true, computesssrft[idx, :] @ lop
, otherwiselop @ ssrft[idx, :].T
, wheressrft``is always of shape ``(total_measurements, *)
with*
being the fitting dimension oflop
.- Parameters:
out_vals – Array where the results will be writen in-place, of shape
(*, )
for either the height or width oflop
. It can be atorch
tensor or anumpy
or aHDF5
array, accepting in-place writing viaout_vals[:] = ...
.flag – An array-like object of shape (1,) and string datatype, optionally used by this function to keep track of current state by setting it to the
processing_flag, success_flag
values via theflag[0] = "..."
assignment.adjoint (bool) – See above.
- Returns:
the SSRFT matrix-free linear operator used for the measurement. Note that the measurement is not returned, but written to
out_vals
, and optionallyflag
is also written.
skerch.linops module
Utilities for linear operators.
This PyTorch library is intended to work with (potentially matrix-free) linear
operators that support left- and right- matrix multiplication (via e.g.
v @ lop
and lop @ v
) as well as the .shape
attribute.
This module implements basic support for said functionality, as well as some default linear operators (composite, diagonal…).
- class skerch.linops.BaseLinOp(shape)[source]
Bases:
object
Base class for linear operators.
Provides the
.shape
attribute and basic matmul functionality with vectors and matrices (also via the@
operator). Intended to be extended with further functionality.- check_input(x, adjoint)[source]
Checking that input has compatible shape.
- Parameters:
x – The input to this linear operator.
adjoint (bool) – If true,
x @ self
is assumed, otherwiseself @ x
.
- class skerch.linops.BaseRandomLinOp(shape, seed=479915)[source]
Bases:
BaseLinOp
Linear operators with pseudo-random behaviour.
Like the base LinOp, but with a
seed
attribute that can be used to control random behaviour.
- class skerch.linops.CompositeLinOp(named_operators)[source]
Bases:
object
Composite linear operator.
This class composes an ordered collection of operators
[A, B, C, ...]
intoA @ B @ C ...
.Note
Using this class could be more inefficient than directly computing the composed operator, e.g. if
A.shape = (1, 1000)
andB.shape = (1000, 1)
, then computing the scalarC = A @ B
and then applying it is more efficient than keeping aCompositeLinearoperator
withA, B
(in terms of both memory and computation). This class does not check for such cases, users are encouraged to take this into account.
- class skerch.linops.DiagonalLinOp(diag)[source]
Bases:
BaseLinOp
Diagonal linear operator.
Given a vector
v
ofd
dimensions, this class implements a diagonal linear operator of shape(d, d)
via left- and right-matrix multiplication, as well as theshape
attribute, only requiring linear (\(\mathcal{O}(d)\)) memory and runtime.- MAX_PRINT_ENTRIES = 20
- class skerch.linops.NegOrthProjLinOp(Q)[source]
Bases:
OrthProjLinOp
Linear operator for a negative orthogonal projector.
Given a “thin” matrix (i.e. height >= width) \(Q\) of orthonormal columns, this class implements the orthogonal projector \(Q Q^T\). Given a “thin” matrix (i.e. height >= width) \(Q\) of orthonormal columns, this class implements the orthogonal projector onto the space orthogonal to its span, i.e. \((I - Q Q^T)\).
- class skerch.linops.OrthProjLinOp(Q)[source]
Bases:
BaseLinOp
Linear operator for an orthogonal projector.
Given a “thin” matrix (i.e. height >= width) \(Q\) of orthonormal columns, this class implements the orthogonal projector onto its span, i.e. \(Q Q^T\).
- class skerch.linops.TorchLinOpWrapper[source]
Bases:
object
Linear operator that always accepts and produces PyTorch tensors.
Since this library operates mainly with PyTorch tensors, but some useful LinOps interface with e.g. NumPy arrays instead, this mixin class acts as a wraper on the
__matmul__
and__rmatmul__
operators, so that the operator expects and returns torch tensors, even when the wrapped operator interfaces with NumPy/HDF5. Usage example:# extend NumPy linear operator via multiple inheritance class TorchWrappedLinOp(TorchLinOpWrapper, LinOp): pass lop = TorchWrappedLinOp(...) # instantiate normally w = lop @ v # now v can be a PyTorch tensor
skerch.ssrft module
Scrambled Subsampled Randomized Fourier Transform (SSRFT).
PyTorch-compatible implementation of the SSRFT (as a matrix-free linear operator and other utilities), from [TYUC2019, 3.2].
- class skerch.ssrft.SSRFT(shape, seed=479915)[source]
Bases:
BaseRandomLinOp
Scrambled Subsampled Randomized Fourier Transform (SSRFT).
This class encapsulates the left- and right-SSRFT transforms into a single linear operator, which is deterministic for the same shape and seed (particularly, also across different torch devices).
- skerch.ssrft.ssrft(x, out_dims, seed=479915, dct_norm='ortho')[source]
Right (forward) matrix multiplication of the SSRFT.
This function implements a matrix-free, right-matmul operator of the Scrambled Subsampled Randomized Fourier Transform (SSRFT) for real-valued signals, from [TYUC2019, 3.2].
\[\text{SSRFT} = R\,\mathcal{F}\,\Pi\,\mathcal{F}\,\Pi'\]Where \(R\) is a random index-picker, mathcal{F} is a Discrete Cosine Transform, and \(\Pi, \Pi'\) are random permutations.
- Parameters:
x – Vector to be projected, such that
y = SSRFT @ x
out_dims – Dimensions of output
y
, must be less thandim(x)
seed – Random seed
- skerch.ssrft.ssrft_adjoint(x, out_dims, seed=479915, dct_norm='ortho')[source]
Left (adjoint) matrix multiplication of the SSRFT.
Adjoint operator of SSRFT, such that
x @ SSRFT = y
. Seessrft()
for more details. Note the following implementation detail:Permutations are orthogonal transforms
Rademacher transforms are also orthogonal (also diagonal and self-inverse)
DCT/DFT are also orthogonal transforms
The index-picker \(R\) is a subset of rows of I.
With orthogonal operators, transform and inverse are the same. Therefore, this adjoint operator takes the following form:
\[\begin{split}\text{SSRFT}^T =& (R\,\mathcal{F}\,\Pi\,\mathcal{F}\,\Pi')^T \\ =& \Pi'^T \, \mathcal{F}^T \, \Pi^T \, \mathcal{F}^T \, R^T \\ =& \Pi'^{-1} \, \mathcal{F}^{-1} \, \Pi^{-1} \, \mathcal{F}^{-1} \, R^T\end{split}\]So we can make use of the inverses, except for \(R^T\), which is a column-truncated identity, so we embed the entries picked by \(R\) into the corresponding indices, and leave the rest as zeros.
skerch.subdiagonals module
Functionality to support sketched estimation of any (sub-)diagonals.
Including:
Hadamard pattern to compute subdiagonals as
mean(hadamard(v) * lop @ v)
, forv_i = SSRFT[i, :]
.In-core routine to compute arbitrary subdiagonals of linops.
- skerch.subdiagonals.subdiag_hadamard_pattern(v, diag_idxs, use_fft=False)[source]
Shifted copies of vectors for subdiagonal Hutchinson estimation.
Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^T] = I\), consider this generalized formulation of the Hutchinson diagonal estimator:
\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(v) \odot Av \big]\]If the \(\varphi\) function is the identity, then \(f(A)\) equals the main diagonal of \(A\). If e.g. \(\varphi\) shifts the entries in \(v\) downwards by
k
positions, we get thek
-th subdiagonal.See also
[BN2022]: Robert A. Baston and Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.
- Parameters:
v – Torch vector expected to contain zero-mean, uncorrelated entries.
diag_idxs – Iterator with integers corresponding to the subdiagonal indices to include, e.g. 0 corresponds to the main diagonal, 1 to the diagonal above, -1 to the diagonal below, and so on.
use_fft – If false, shifted copies of
v
are pasted on the result. This requires onlylen(v)
memory, but haslen(v) * len(diag_idxs)
time complexity. If this argument is true, an FFT convolution is used instead. This requires at least4 * len(v)
memory, but the arithmetic has a complexity oflen(v) * log(len(v))
, which can be advantageous wheneverlen(diag_idxs)
becomes very large.
- Returns:
A vector of same shape, type and device as
v
, composed of shifted copies ofv
as given bydiag_idxs
.
- skerch.subdiagonals.subdiagpp(lop, num_meas, lop_dtype, lop_device, seed=479915, deflation_rank=0, diag_idx=0)[source]
Matrix-free (sub-) diagonal sketched estimator via Diag++.
See also
Reference for this algorithm:
[BN2022]: Robert A. Baston and Yuji Nakatsukasa. 2022. “Stochastic diagonal estimation: probabilistic bounds and an improved algorithm”. CoRR abs/2201.10684.
Given a linear operator
lop
, implements an unbiased diagonal estimator, composed of orthogonal deflation followed by Hutchinson estimation.The (optional) deflation is based on efficiently obtaining a thin matrix of orthonormal columns
Q
, such that:lop = (Q @ Q.T @ lop) + ((I - Q@Q.T) @ lop)
Then, the (sub-) diagonal entries for the first component can be computed exactly and efficiently by picking rows of
Q
. The second, “deflated” term is then subject to Hutchinson estimation:diag_k(defl_lop) = expectation(hadamard_k(v) * defl_lop @ v)
,This is achieved by sampling random
v
vectors from a SSRFT operator, and shifting them usingsubdiag_hadamard_pattern()
to capture the desiredk
-th subdiagonal.The final estimation is then the sum of the exact first component plus the estimated, deflated component.
- Parameters:
lop – The linear operator to extract (sub-) diagonals from. It must implement a
lop.shape = (h, w)
attribute as well as the left- and right- matmul operator@
, interacting with torch tensors.num_meas – Number of samples
v
used to compute the expectation.lop_dtype – Datatype of
lop
.lop_device – Device of
lop
.seed – Random seed for the
v
random SSRFT vectors.deflation_rank – Rank of the deflation matrix
Q
to be computed.diag_idx – Position of the (sub-) diagonal to compute. 0 corresponds to the main diagonal, +1 to the diagonal above, -1 to the diagonal below, and so on. Note that if diagonals are too far away from the main diagonal (or if
lop
is small enough), it may be worth it to directly perform one exact measurement per diagonal entry and not useDiag++
at all.
- Returns:
The tuple
(result, defl, (top_norm, bottom_norm))
. The first element is the estimated diagonal. The second is the computed deflation matrix, orNone
ifdeflation_rank=0
. The last element is the Euclidean norm of the deflation and the deflated parts of the diagonal, respectively, which can be used to roughly diagnose the contribution of each part of the algorithm.
skerch.synthmat module
Utilities for synthetic (random) matrices.
This module implements a few families of synthetic random matrices, inspired by [TYUC2019, 7.3.1], but also expanded:
Low rank + noise: dampened Gaussian noise matrix in the form
G @ G.T
, plus a (low-rank) subset of the identity matrix.Exponential: Singular values have a low-rank unit section followed by exponential decay.
Polynomial: Singular values have a low-rank unit section followed by polynomial decay.
In [TYUC2019], the exponential and polynomial matrices are diagonal. here, we extend them as follows:
Random orthogonal matrices for the left and right singular spaces
Optionally, the matrix is square and symmetric so left and right vectors are adjoint
Optionally, the decaying spectrum is multiplied by Rademacher noise, to test also with non-PSD matrices
- class skerch.synthmat.SynthMat[source]
Bases:
object
Static class to produce different families of synthetic matrices.
- classmethod exp_decay(shape=(100, 100), rank=10, decay=0.5, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]
Random matrix with exponential singular value decay.
Produces a matrix in the form
U @ S @ V
, whereU
andV
are random orthogonal matrices, andS
has entries with exponentially decaying magnitude, analogous to the ones described in [TYUC2019, 7.3.1].- Parameters:
rank (int) – Entries in
S[:rank]
will have a magnitude of 1.decay (float) –
Parameter determining how quickly magnitudes in
S[rank:]
decay to zero, following \(10^{-d}, 10^{-2d}, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.01 for slow decay, 0.1 for medium, 0.5 for fast.symmetric (bool) – If true,
V == U
.psd – If false, and matrix is symmetric, the “tail” singular values will be multiplied with Rademacher noise to create a non-PSD matrix.
- static lowrank_noise(shape=(100, 100), rank=10, snr=0.0001, seed=479915, dtype=torch.float64, device='cpu')[source]
Low rank + noise random matrix.
Produces a lowrank+noise matrix, as follows: First, A Gaussian IID noise matrix
G
is sampled for the provided (square) shape. Then, the noisy matrix(snr / dims) * G@G.T
is computed. Finally, the firstrank
diagonal entries are incremented by 1. As a result, we have a symmetric noisy matrix with strong diagonal for the firstrank
entries, and with a signal-to-noise ratio ofsnr
.- Parameters:
rank (int) – How many diagonal entries should be reinforced.
snr (float) –
Signal-to-noise ratio. In [TYUC2019, 7.3.1] the following values are used: 1e-4 for low noise, 1e-2 for mid noise, and 1e-1 for high noise.
- Returns:
A matrix of given shape and properties.
- classmethod poly_decay(shape=(100, 100), rank=10, decay=0.5, symmetric=True, seed=479915, dtype=torch.float64, device='cpu', psd=True)[source]
Random matrix with polynomial singular value decay.
Produces a matrix in the form
U @ S @ V.T
, whereU
andV
are random orthogonal matrices, andS
has entries with polynomially decaying magnitude, analogous to the ones described in [TYUC2019, 7.3.1].- Parameters:
rank (int) – Entries in
S[:rank]
will have a magnitude of 1.decay (float) –
Parameter determining how quickly magnitudes in
S[rank:]
decay to zero, following \(2^d, 3^d, 4^d, \dots\) for decay \(d\). In [TYUC2019, 7.3.1] the following values are used: 0.5 for slow decay, 1 for medium, 2 for fast.symmetric (bool) – If true,
V == U
.psd – If false, and matrix is symmetric, the “tail” singular values will be multiplied with Rademacher noise to create a non-PSD matrix.
skerch.triangles module
Functionality to support sketched estimation of upper/lower triangulars.
Given an arbitrary square linop
, this module implements matrix-free
functionality to compute left- and right matrix-multiplications of both
lower_triang(linop)
and upper_triang(linop)
.
This is done by first exactly computing a limited number of “main rectangles”,
a.k.a. “steps”, that fully belong to the triangular submatrix, and then
estimating the “serrated pattern” that they leave as residual via modified
Hutchinson. The addition of the steps plus the serrated pattern forms the full
triangle. See TriangularLinOp
and the
Examples section for more details.
Contents:
Hadamard pattern to estimate block-triangular matrix-vector products using Hutchinson’s method (used by
TriangularLinOp
).The
TriangularLinOp
linear operator implementing triangular matrix-vector products for arbitrary square linops.
- class skerch.triangles.TriangularLinOp(lop, stair_width=None, num_hutch_measurements=0, lower=True, with_main_diagonal=True, seed=479915, use_fft=True)[source]
Bases:
BaseLinOp
Given a square linop, compute products with one of its triangles.
The triangle of a linear operator can be approximated from the full operator via a “staircase pattern” of exact measurements, whose computation is exact and fast. For example, given an operator of shape
(1000, 1000)
, and stairs of size 100, yields 9 exact measurements strictly under the diagonal, the first one coveringlop[100:, :100]
, the next onelop[200, 100:200]
, and so on. The more measurements, the more closely the full triangle is approximated.Note that this staircase pattern leaves a block-triangular section of the linop untouched (near the main diagonal). This part can be then estimated with the help of
serrated_hadamard_pattern()
, completing the triangular approximation, as follows:Given a square linear operator \(A\), and random vectors \(v \sim \mathcal{R}\) with \(\mathbb{E}[v v^T] = I\), consider the generalized Hutchinson diagonal estimator:
\[f(A) = \mathbb{E}_{v \sim \mathcal{R}} \big[ \varphi(v) \odot Av \big]\]In this case, if the \(\varphi\) function follows a “serrated Hadamard pattern”, \(f(A)\) will equal a block-triangular subset of \(A\).
- skerch.triangles.serrated_hadamard_pattern(v, blocksize, with_main_diagonal=True, lower=True, use_fft=False)[source]
Shifted copies of vectors for block-triangular Hutchinson estimation.
- Parameters:
v – Torch vector expected to contain zero-mean, uncorrelated entries.
with_main_diagonal – If true, the main diagonal will be included in the patterns, otherwise excluded.
lower – If true, the block-triangles will be below the diagonal, otherwise above.
use_fft – See
subdiag_hadamard_pattern()
.
- Returns:
A vector of same shape, type and device as
v
, composed of shifted copies ofv
following a block-triangular pattern.
For example, given a 10-dimensional vector, the corresponding serrated pattern with
blocksize=3, with_main_diagonal=True, lower=True
yields the following entries:v1
v1 + v2
v1 + v2 + v3
v4
v4 + v5
v4 + v5 + v6
v7
v7 + v8
v7 + v8 + v9
v10
If main diagonal is excluded, it will look like this instead:
0
v1
v1 + v2
0
v4
v4 + v5
0
v7
v7 + v8
0
And if
lower=False
, it will look like this instead:v1
v4 + v3 + v2
v4 + v3
v4
v7 + v6 + v5
v7 + v6
v7
v10 + v9 + v8
v10 + v9
v10
skerch.utils module
General utilities for the skerch
library.
- exception skerch.utils.BadShapeError[source]
Bases:
Exception
Error to be thrown when a shape is not as it should.
- exception skerch.utils.NoFlatError[source]
Bases:
Exception
Error to be thrown when a tensor is not flat (i.e. not a vector).
- skerch.utils.gaussian_noise(shape, mean=0.0, std=1.0, seed=None, dtype=torch.float64, device='cpu')[source]
Reproducible
torch.normal
Gaussian noise.- Returns:
A tensor of given shape, dtype and device, containing gaussian noise with given mean and std (analogous to
torch.normal
), but with reproducible behaviour fixed to given random seed.
- skerch.utils.rademacher_flip(x, seed=None, inplace=True, rng_device='cpu')[source]
Reproducible random sign flip using Rademacher noise.
Note
This function makes use of
uniform_noise()
to sample the Rademacher noise. Ifx
itself has been generated usinguniform_noise
, make sure to use a different seed to mitigate correlations.Warning
See
rademacher_noise()
for notes on reproducibility and more info.
- skerch.utils.rademacher_noise(shape, seed=None, device='cpu')[source]
Reproducible Rademacher noise.
Note
This function makes use of
uniform_noise()
to sample the Rademacher noise. Ifx
itself has been generated usinguniform_noise
, make sure to use a different seed to mitigate correlations.Warning
PyTorch does not ensure RNG reproducibility across devices. This parameter determines the device to generate the noise from. If you want cross-device reproducibility, make sure that the noise is always generated from the same device.
- Parameters:
shape – Shape of the output tensor with Rademacher noise.
seed – Seed for the randomness.
device – Device of the output tensor and also source for the noise. See warning.
- skerch.utils.randperm(n, seed=None, device='cpu', inverse=False)[source]
Reproducible randperm of
n
integers from 0 to (n-1) (both included).- Parameters:
inverse (bool) – If False, a random permutation
P
is provided. If true, an inverse permutationQ
is provided, such that both permutations are inverse to each other, i.e.v == v[P][Q] == v[Q][P]
.
- skerch.utils.torch_dtype_as_str(dtype)[source]
Torch dtype to string.
Given a PyTorch datatype object, like
torch.float32
, returns the corresponding string, in this case ‘float32’.
- skerch.utils.uniform_noise(shape, seed=None, dtype=torch.float64, device='cpu')[source]
Reproducible
torch.rand
uniform noise.- Returns:
A tensor of given shape, dtype and device, containing uniform random noise between 0 and 1 (analogous to
torch.rand
), but with reproducible behaviour fixed to given random seed.