backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)¶
Computes the gradient of current tensor w.r.t. graph leaves.
The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying
gradient. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t.
This function accumulates gradients in the leaves - you might need to zero
.gradattributes or set them to
Nonebefore calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.
If you run any forward ops, create
gradient, and/or call
backwardin a user-specified CUDA stream context, see Stream semantics of backward passes.
gradient (Tensor or None) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless
create_graphis True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional.
retain_graph (bool, optional) – If
False, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of
create_graph (bool, optional) – If
True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to
inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be accumulated into
.grad. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors.