#### Note

``torch.nn`` only supports mini-batches. The entire ``torch.nn``\n package only supports inputs that are a mini-batch of samples, and not\n a single sample.\n\n For example, ``nn.Conv2d`` will take in a 4D Tensor of\n ``nSamples x nChannels x Height x Width``.\n\n If you have a single sample, just use ``input.unsqueeze(0)`` to add\n a fake batch dimension.

\n\nBefore proceeding further, let's recap all the classes you\u2019ve seen so far.\n\n**Recap:**\n - ``torch.Tensor`` - A *multi-dimensional array* with support for autograd\n operations like ``backward()``. Also *holds the gradient* w.r.t. the\n tensor.\n - ``nn.Module`` - Neural network module. *Convenient way of\n encapsulating parameters*, with helpers for moving them to GPU,\n exporting, loading, etc.\n - ``nn.Parameter`` - A kind of Tensor, that is *automatically\n registered as a parameter when assigned as an attribute to a*\n ``Module``.\n - ``autograd.Function`` - Implements *forward and backward definitions\n of an autograd operation*. Every ``Tensor`` operation creates at\n least a single ``Function`` node that connects to functions that\n created a ``Tensor`` and *encodes its history*.\n\n**At this point, we covered:**\n - Defining a neural network\n - Processing inputs and calling backward\n\n**Still Left:**\n - Computing the loss\n - Updating the weights of the network\n\n## Loss Function\nA loss function takes the (output, target) pair of inputs, and computes a\nvalue that estimates how far away the output is from the target.\n\nThere are several different\n[loss functions](https://pytorch.org/docs/nn.html#loss-functions) under the\nnn package .\nA simple loss is: ``nn.MSELoss`` which computes the mean-squared error\nbetween the output and the target.\n\nFor example:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output = net(input)\ntarget = torch.randn(10) # a dummy target, for example\ntarget = target.view(1, -1) # make it the same shape as output\ncriterion = nn.MSELoss()\n\nloss = criterion(output, target)\nprint(loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, if you follow ``loss`` in the backward direction, using its\n``.grad_fn`` attribute, you will see a graph of computations that looks\nlike this:\n\n```sh\ninput -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n -> flatten -> linear -> relu -> linear -> relu -> linear\n -> MSELoss\n -> loss\n```\nSo, when we call ``loss.backward()``, the whole graph is differentiated\nw.r.t. the neural net parameters, and all Tensors in the graph that have\n``requires_grad=True`` will have their ``.grad`` Tensor accumulated with the\ngradient.\n\nFor illustration, let us follow a few steps backward:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss.grad_fn) # MSELoss\nprint(loss.grad_fn.next_functions[0][0]) # Linear\nprint(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Backprop\nTo backpropagate the error all we have to do is to ``loss.backward()``.\nYou need to clear the existing gradients though, else gradients will be\naccumulated to existing gradients.\n\n\nNow we shall call ``loss.backward()``, and have a look at conv1's bias\ngradients before and after the backward.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"net.zero_grad() # zeroes the gradient buffers of all parameters\n\nprint('conv1.bias.grad before backward')\nprint(net.conv1.bias.grad)\n\nloss.backward()\n\nprint('conv1.bias.grad after backward')\nprint(net.conv1.bias.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we have seen how to use loss functions.\n\n**Read Later:**\n\n The neural network package contains various modules and loss functions\n that form the building blocks of deep neural networks. A full list with\n documentation is [here](https://pytorch.org/docs/nn).\n\n**The only thing left to learn is:**\n\n - Updating the weights of the network\n\n## Update the weights\nThe simplest update rule used in practice is the Stochastic Gradient\nDescent (SGD):\n\n.. code:: python\n\n weight = weight - learning_rate * gradient\n\nWe can implement this using simple Python code:\n\n.. code:: python\n\n learning_rate = 0.01\n for f in net.parameters():\n f.data.sub_(f.grad.data * learning_rate)\n\nHowever, as you use neural networks, you want to use various different\nupdate rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\nTo enable this, we built a small package: ``torch.optim`` that\nimplements all these methods. Using it is very simple:\n\n.. code:: python\n\n import torch.optim as optim\n\n # create your optimizer\n optimizer = optim.SGD(net.parameters(), lr=0.01)\n\n # in your training loop:\n optimizer.zero_grad() # zero the gradient buffers\n output = net(input)\n loss = criterion(output, target)\n loss.backward()\n optimizer.step() # Does the update\n\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"