torch.linalg.svd¶

torch.linalg.
svd
(A, full_matrices=True, *, out=None)¶ Computes the singular value decomposition (SVD) of a matrix.
Letting $\mathbb{K}$ be $\mathbb{R}$ or $\mathbb{C}$, the full SVD of a matrix $A \in \mathbb{K}^{m \times n}$, if k = min(m,n), is defined as
$A = U \operatorname{diag}(S) V^{\text{H}} \mathrlap{\qquad U \in \mathbb{K}^{m \times m}, S \in \mathbb{R}^k, V \in \mathbb{K}^{n \times n}}$where $\operatorname{diag}(S) \in \mathbb{K}^{m \times n}$, $V^{\text{H}}$ is the conjugate transpose when $V$ is complex, and the transpose when $V$ is realvalued. The matrices $U$, $V$ (and thus $V^{\text{H}}$) are orthogonal in the real case, and unitary in the complex case.
When m > n (resp. m < n) we can drop the last m  n (resp. n  m) columns of U (resp. V) to form the reduced SVD:
$A = U \operatorname{diag}(S) V^{\text{H}} \mathrlap{\qquad U \in \mathbb{K}^{m \times k}, S \in \mathbb{R}^k, V \in \mathbb{K}^{k \times n}}$where $\operatorname{diag}(S) \in \mathbb{K}^{k \times k}$. In this case, $U$ and $V$ also have orthonormal columns.
Supports input of float, double, cfloat and cdouble dtypes. Also supports batches of matrices, and if
A
is a batch of matrices then the output has the same batch dimensions.The returned decomposition is a named tuple (U, S, Vh) which corresponds to $U$, $S$, $V^{\text{H}}$ above.
The singular values are returned in descending order.
The parameter
full_matrices
chooses between the full (default) and reduced SVD.Differences with numpy.linalg.svd:
Unlike numpy.linalg.svd, this function always returns a tuple of three tensors and it doesn’t support compute_uv argument. Please use
torch.linalg.svdvals()
, which computes only the singular values, instead of compute_uv=False.
Note
When
full_matrices
= True, the gradients with respect to U[…, :, min(m, n):] and Vh[…, min(m, n):, :] will be ignored, as those vectors can be arbitrary bases of the corresponding subspaces.Warning
The returned tensors U and V are not unique, nor are they continuous with respect to
A
. Due to this lack of uniqueness, different hardware and software may compute different singular vectors.This nonuniqueness is caused by the fact that multiplying any pair of singular vectors $u_k, v_k$ by 1 in the real case or by $e^{i \phi}, \phi \in \mathbb{R}$ in the complex case produces another two valid singular vectors of the matrix. This nonuniqueness problem is even worse when the matrix has repeated singular values. In this case, one may multiply the associated singular vectors of U and V spanning the subspace by a rotation matrix and the resulting vectors will span the same subspace.
Warning
Gradients computed using U or Vh will only be finite when
A
does not have zero as a singular value or repeated singular values. Furthermore, if the distance between any two singular values is close to zero, the gradient will be numerically unstable, as it depends on the singular values $\sigma_i$ through the computation of $\frac{1}{\min_{i \neq j} \sigma_i^2  \sigma_j^2}$. The gradient will also be numerically unstable whenA
has small singular values, as it also depends on the computaiton of $\frac{1}{\sigma_i}$.See also
torch.linalg.svdvals()
computes only the singular values. Unliketorch.linalg.svd()
, the gradients ofsvdvals()
are always numerically stable.torch.linalg.eig()
for a function that computes another type of spectral decomposition of a matrix. The eigendecomposition works just on square matrices.torch.linalg.eigh()
for a (faster) function that computes the eigenvalue decomposition for Hermitian and symmetric matrices.torch.linalg.qr()
for another (much faster) decomposition that works on general matrices. Parameters
 Keyword Arguments
out (tuple, optional) – output tuple of three tensors. Ignored if None.
 Returns
A named tuple (U, S, Vh) which corresponds to $U$, $S$, $V^{\text{H}}$ above.
S will always be realvalued, even when
A
is complex. It will also be ordered in descending order.U and Vh will have the same dtype as
A
. The left / right singular vectors will be given by the columns of U and the rows of Vh respectively.
Examples:
>>> A = torch.randn(5, 3) >>> U, S, Vh = torch.linalg.svd(A, full_matrices=False) >>> U.shape, S.shape, Vh.shape (torch.Size([5, 3]), torch.Size([3]), torch.Size([3, 3])) >>> torch.dist(A, U @ torch.diag(S) @ Vh) tensor(1.0486e06) >>> U, S, Vh = torch.linalg.svd(A) >>> U.shape, S.shape, Vh.shape (torch.Size([5, 5]), torch.Size([3]), torch.Size([3, 3])) >>> torch.dist(A, U[:, :3] @ torch.diag(S) @ Vh) tensor(1.0486e06) >>> A = torch.randn(7, 5, 3) >>> U, S, Vh = torch.linalg.svd(A, full_matrices=False) >>> torch.dist(A, U @ torch.diag_embed(S) @ Vh) tensor(3.0957e06)