torch.nn.utils.parametrizations.orthogonal(module, name='weight', orthogonal_map=None, *, use_trivialization=True)[source]

Applies an orthogonal or unitary parametrization to a matrix or a batch of matrices.

Letting K\mathbb{K} be R\mathbb{R} or C\mathbb{C}, the parametrized matrix QKm×nQ \in \mathbb{K}^{m \times n} is orthogonal as

QHQ=Inif mnQQH=Imif m<n\begin{align*} Q^{\text{H}}Q &= \mathrm{I}_n \mathrlap{\qquad \text{if }m \geq n}\\ QQ^{\text{H}} &= \mathrm{I}_m \mathrlap{\qquad \text{if }m < n} \end{align*}

where QHQ^{\text{H}} is the conjugate transpose when QQ is complex and the transpose when QQ is real-valued, and In\mathrm{I}_n is the n-dimensional identity matrix. In plain words, QQ will have orthonormal columns whenever mnm \geq n and orthonormal rows otherwise.

If the tensor has more than two dimensions, we consider it as a batch of matrices of shape (…, m, n).

The matrix QQ may be parametrized via three different orthogonal_map in terms of the original tensor:

  • "matrix_exp"/"cayley": the matrix_exp() Q=exp(A)Q = \exp(A) and the Cayley map Q=(In+A/2)(InA/2)1Q = (\mathrm{I}_n + A/2)(\mathrm{I}_n - A/2)^{-1} are applied to a skew-symmetric AA to give an orthogonal matrix.

  • "householder": computes a product of Householder reflectors (householder_product()).

"matrix_exp"/"cayley" often make the parametrized weight converge faster than "householder", but they are slower to compute for very thin or very wide matrices.

If use_trivialization=True (default), the parametrization implements the “Dynamic Trivialization Framework”, where an extra matrix BKn×nB \in \mathbb{K}^{n \times n} is stored under module.parametrizations.weight[0].base. This helps the convergence of the parametrized layer at the expense of some extra memory use. See Trivializations for Gradient-Based Optimization on Manifolds .

Initial value of QQ: If the original tensor is not parametrized and use_trivialization=True (default), the initial value of QQ is that of the original tensor if it is orthogonal (or unitary in the complex case) and it is orthogonalized via the QR decomposition otherwise (see torch.linalg.qr()). Same happens when it is not parametrized and orthogonal_map="householder" even when use_trivialization=False. Otherwise, the initial value is the result of the composition of all the registered parametrizations applied to the original tensor.


This function is implemented using the parametrization functionality in register_parametrization().

  • module (nn.Module) – module on which to register the parametrization.

  • name (str, optional) – name of the tensor to make orthogonal. Default: "weight".

  • orthogonal_map (str, optional) – One of the following: "matrix_exp", "cayley", "householder". Default: "matrix_exp" if the matrix is square or complex, "householder" otherwise.

  • use_trivialization (bool, optional) – whether to use the dynamic trivialization framework. Default: True.


The original module with an orthogonal parametrization registered to the specified weight


>>> orth_linear = orthogonal(nn.Linear(20, 40))
>>> orth_linear
in_features=20, out_features=40, bias=True
(parametrizations): ModuleDict(
    (weight): ParametrizationList(
    (0): _Orthogonal()
>>> Q = orth_linear.weight
>>> torch.dist(Q.T @ Q, torch.eye(20))


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources