Defines a formula for differentiating the operation.
This function is to be overridden by all subclasses.
It must accept a context
ctxas the first argument, followed by as many outputs as the
forward()returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to
forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.
The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,
ctx.needs_input_grad = Trueif the first input to
forward()needs gradient computated w.r.t. the output.