#### Note

``torch.nn`` only supports mini-batches. The entire ``torch.nn``\n package only supports inputs that are a mini-batch of samples, and not\n a single sample.\n\n For example, ``nn.Conv2d`` will take in a 4D Tensor of\n ``nSamples x nChannels x Height x Width``.\n\n If you have a single sample, just use ``input.unsqueeze(0)`` to add\n a fake batch dimension.

\n\nBefore proceeding further, let's recap all the classes you\u2019ve seen so far.\n\n**Recap:**\n - ``torch.Tensor`` - A *multi-dimensional array*.\n - ``autograd.Variable`` - *Wraps a Tensor and records the history of\n operations* applied to it. Has the same API as a ``Tensor``, with\n some additions like ``backward()``. Also *holds the gradient*\n w.r.t. the tensor.\n - ``nn.Module`` - Neural network module. *Convenient way of\n encapsulating parameters*, with helpers for moving them to GPU,\n exporting, loading, etc.\n - ``nn.Parameter`` - A kind of Variable, that is *automatically\n registered as a parameter when assigned as an attribute to a*\n ``Module``.\n - ``autograd.Function`` - Implements *forward and backward definitions\n of an autograd operation*. Every ``Variable`` operation, creates at\n least a single ``Function`` node, that connects to functions that\n created a ``Variable`` and *encodes its history*.\n\n**At this point, we covered:**\n - Defining a neural network\n - Processing inputs and calling backward\n\n**Still Left:**\n - Computing the loss\n - Updating the weights of the network\n\nLoss Function\n-------------\nA loss function takes the (output, target) pair of inputs, and computes a\nvalue that estimates how far away the output is from the target.\n\nThere are several different\n`loss functions