grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)¶
Computes and returns the sum of gradients of outputs with respect to the inputs.
grad_outputsshould be a sequence of length matching
outputcontaining the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can be
True, the function will only return a list of gradients w.r.t the specified inputs. If it’s
False, then gradient w.r.t. all remaining leaves will still be computed, and will be accumulated into their
If you run any forward ops, create
grad_outputs, and/or call
gradin a user-specified CUDA stream context, see Stream semantics of backward passes.
outputs (sequence of Tensor) – outputs of the differentiated function.
inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into
grad_outputs (sequence of Tensor) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None.
retain_graph (bool, optional) – If
False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to
Trueis not needed and often can be worked around in a much more efficient way. Defaults to the value of
create_graph (bool, optional) – If
True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default:
allow_unused (bool, optional) – If
False, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to