\n",
"

\n",
"\n",
"Before proceeding further, let\\'s recap all the classes you've seen so\n",
"far.\n",
"\n",
"**Recap:**\n",
"\n",
": - `torch.Tensor` - A *multi-dimensional array* with support for\n",
" autograd operations like `backward()`. Also *holds the gradient*\n",
" w.r.t. the tensor.\n",
" - `nn.Module` - Neural network module. *Convenient way of\n",
" encapsulating parameters*, with helpers for moving them to GPU,\n",
" exporting, loading, etc.\n",
" - `nn.Parameter` - A kind of Tensor, that is *automatically\n",
" registered as a parameter when assigned as an attribute to a*\n",
" `Module`.\n",
" - `autograd.Function` - Implements *forward and backward\n",
" definitions of an autograd operation*. Every `Tensor` operation\n",
" creates at least a single `Function` node that connects to\n",
" functions that created a `Tensor` and *encodes its history*.\n",
"\n",
"**At this point, we covered:**\n",
"\n",
": - Defining a neural network\n",
" - Processing inputs and calling backward\n",
"\n",
"**Still Left:**\n",
"\n",
": - Computing the loss\n",
" - Updating the weights of the network\n",
"\n",
"Loss Function\n",
"=============\n",
"\n",
"A loss function takes the (output, target) pair of inputs, and computes\n",
"a value that estimates how far away the output is from the target.\n",
"\n",
"There are several different [loss\n",
"functions](https://pytorch.org/docs/nn.html#loss-functions) under the nn\n",
"package . A simple loss is: `nn.MSELoss` which computes the mean-squared\n",
"error between the output and the target.\n",
"\n",
"For example:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output = net(input)\n",
"target = torch.randn(10) # a dummy target, for example\n",
"target = target.view(1, -1) # make it the same shape as output\n",
"criterion = nn.MSELoss()\n",
"\n",
"loss = criterion(output, target)\n",
"print(loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, if you follow `loss` in the backward direction, using its\n",
"`.grad_fn` attribute, you will see a graph of computations that looks\n",
"like this:\n",
"\n",
"``` {.sourceCode .sh}\n",
"input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n",
" -> flatten -> linear -> relu -> linear -> relu -> linear\n",
" -> MSELoss\n",
" -> loss\n",
"```\n",
"\n",
"So, when we call `loss.backward()`, the whole graph is differentiated\n",
"w.r.t. the neural net parameters, and all Tensors in the graph that have\n",
"`requires_grad=True` will have their `.grad` Tensor accumulated with the\n",
"gradient.\n",
"\n",
"For illustration, let us follow a few steps backward:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss.grad_fn) # MSELoss\n",
"print(loss.grad_fn.next_functions[0][0]) # Linear\n",
"print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Backprop\n",
"========\n",
"\n",
"To backpropagate the error all we have to do is to `loss.backward()`.\n",
"You need to clear the existing gradients though, else gradients will be\n",
"accumulated to existing gradients.\n",
"\n",
"Now we shall call `loss.backward()`, and have a look at conv1\\'s bias\n",
"gradients before and after the backward.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"net.zero_grad() # zeroes the gradient buffers of all parameters\n",
"\n",
"print('conv1.bias.grad before backward')\n",
"print(net.conv1.bias.grad)\n",
"\n",
"loss.backward()\n",
"\n",
"print('conv1.bias.grad after backward')\n",
"print(net.conv1.bias.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we have seen how to use loss functions.\n",
"\n",
"**Read Later:**\n",
"\n",
"> The neural network package contains various modules and loss functions\n",
"> that form the building blocks of deep neural networks. A full list\n",
"> with documentation is [here](https://pytorch.org/docs/nn).\n",
"\n",
"**The only thing left to learn is:**\n",
"\n",
"> - Updating the weights of the network\n",
"\n",
"Update the weights\n",
"==================\n",
"\n",
"The simplest update rule used in practice is the Stochastic Gradient\n",
"Descent (SGD):\n",
"\n",
"``` {.sourceCode .python}\n",
"weight = weight - learning_rate * gradient\n",
"```\n",
"\n",
"We can implement this using simple Python code:\n",
"\n",
"``` {.sourceCode .python}\n",
"learning_rate = 0.01\n",
"for f in net.parameters():\n",
" f.data.sub_(f.grad.data * learning_rate)\n",
"```\n",
"\n",
"However, as you use neural networks, you want to use various different\n",
"update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable\n",
"this, we built a small package: `torch.optim` that implements all these\n",
"methods. Using it is very simple:\n",
"\n",
"``` {.sourceCode .python}\n",
"import torch.optim as optim\n",
"\n",
"# create your optimizer\n",
"optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
"\n",
"# in your training loop:\n",
"optimizer.zero_grad() # zero the gradient buffers\n",
"output = net(input)\n",
"loss = criterion(output, target)\n",
"loss.backward()\n",
"optimizer.step() # Does the update\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`torch.nn`

only supports mini-batches. The entire `torch.nn`

package only supports inputs that are a mini-batch of samples, and nota single sample.For example, `nn.Conv2d`

will take in a 4D Tensor of`nSamples x nChannels x Height x Width`

.If you have a single sample, just use `input.unsqueeze(0)`

to adda fake batch dimension.

\n",
"

\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Observe how gradient buffers had to be manually set to zero using`optimizer.zero_grad()`

. This is because gradients are accumulatedas explained in the Backprop section.