\n \n

\n\n## PyTorch Tensors\n\nFollow along with the video beginning at [03:50](https://www.youtube.com/watch?v=IC0_FRiX-sw&t=230s)_.\n\nFirst, we\u2019ll import pytorch.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let\u2019s see a few basic tensor manipulations. First, just a few of the\nways to create tensors:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"z = torch.zeros(5, 3)\nprint(z)\nprint(z.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, we create a 5x3 matrix filled with zeros, and query its datatype\nto find out that the zeros are 32-bit floating point numbers, which is\nthe default PyTorch.\n\nWhat if you wanted integers instead? You can always override the\ndefault:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"i = torch.ones((5, 3), dtype=torch.int16)\nprint(i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can see that when we do change the default, the tensor helpfully\nreports this when printed.\n\nIt\u2019s common to initialize learning weights randomly, often with a\nspecific seed for the PRNG for reproducibility of results:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"torch.manual_seed(1729)\nr1 = torch.rand(2, 2)\nprint('A random tensor:')\nprint(r1)\n\nr2 = torch.rand(2, 2)\nprint('\\nA different random tensor:')\nprint(r2) # new values\n\ntorch.manual_seed(1729)\nr3 = torch.rand(2, 2)\nprint('\\nShould match r1:')\nprint(r3) # repeats values of r1 because of re-seed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch tensors perform arithmetic operations intuitively. Tensors of\nsimilar shapes may be added, multiplied, etc. Operations with scalars\nare distributed over the tensor:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"ones = torch.ones(2, 3)\nprint(ones)\n\ntwos = torch.ones(2, 3) * 2 # every element is multiplied by 2\nprint(twos)\n\nthrees = ones + twos # addition allowed because shapes are similar\nprint(threes) # tensors are added element-wise\nprint(threes.shape) # this has the same dimensions as input tensors\n\nr1 = torch.rand(2, 3)\nr2 = torch.rand(3, 2)\n# uncomment this line to get a runtime error\n# r3 = r1 + r2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here\u2019s a small sample of the mathematical operations available:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1\nprint('A random matrix, r:')\nprint(r)\n\n# Common mathematical operations are supported:\nprint('\\nAbsolute value of r:')\nprint(torch.abs(r))\n\n# ...as are trigonometric functions:\nprint('\\nInverse sine of r:')\nprint(torch.asin(r))\n\n# ...and linear algebra operations like determinant and singular value decomposition\nprint('\\nDeterminant of r:')\nprint(torch.det(r))\nprint('\\nSingular value decomposition of r:')\nprint(torch.svd(r))\n\n# ...and statistical and aggregate operations:\nprint('\\nAverage and standard deviation of r:')\nprint(torch.std_mean(r))\nprint('\\nMaximum value of r:')\nprint(torch.max(r))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There\u2019s a good deal more to know about the power of PyTorch tensors,\nincluding how to set them up for parallel computations on GPU - we\u2019ll be\ngoing into more depth in another video.\n\n## PyTorch Models\n\nFollow along with the video beginning at [10:00](https://www.youtube.com/watch?v=IC0_FRiX-sw&t=600s)_.\n\nLet\u2019s talk about how we can express models in PyTorch\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch # for all things PyTorch\nimport torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models\nimport torch.nn.functional as F # for the activation function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
".. figure:: /_static/img/mnist.png\n :alt: le-net-5 diagram\n\n*Figure: LeNet-5*\n\nAbove is a diagram of LeNet-5, one of the earliest convolutional neural\nnets, and one of the drivers of the explosion in Deep Learning. It was\nbuilt to read small images of handwritten numbers (the MNIST dataset),\nand correctly classify which digit was represented in the image.\n\nHere\u2019s the abridged version of how it works:\n\n- Layer C1 is a convolutional layer, meaning that it scans the input\n image for features it learned during training. It outputs a map of\n where it saw each of its learned features in the image. This\n \u201cactivation map\u201d is downsampled in layer S2.\n- Layer C3 is another convolutional layer, this time scanning C1\u2019s\n activation map for *combinations* of features. It also puts out an\n activation map describing the spatial locations of these feature\n combinations, which is downsampled in layer S4.\n- Finally, the fully-connected layers at the end, F5, F6, and OUTPUT,\n are a *classifier* that takes the final activation map, and\n classifies it into one of ten bins representing the 10 digits.\n\nHow do we express this simple neural network in code?\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class LeNet(nn.Module):\n\n def __init__(self):\n super(LeNet, self).__init__()\n # 1 input image channel (black & white), 6 output channels, 5x5 square convolution\n # kernel\n self.conv1 = nn.Conv2d(1, 6, 5)\n self.conv2 = nn.Conv2d(6, 16, 5)\n # an affine operation: y = Wx + b\n self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension\n self.fc2 = nn.Linear(120, 84)\n self.fc3 = nn.Linear(84, 10)\n\n def forward(self, x):\n # Max pooling over a (2, 2) window\n x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n # If the size is a square you can only specify a single number\n x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n x = x.view(-1, self.num_flat_features(x))\n x = F.relu(self.fc1(x))\n x = F.relu(self.fc2(x))\n x = self.fc3(x)\n return x\n\n def num_flat_features(self, x):\n size = x.size()[1:] # all dimensions except the batch dimension\n num_features = 1\n for s in size:\n num_features *= s\n return num_features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking over this code, you should be able to spot some structural\nsimilarities with the diagram above.\n\nThis demonstrates the structure of a typical PyTorch model: \n\n- It inherits from ``torch.nn.Module`` - modules may be nested - in fact,\n even the ``Conv2d`` and ``Linear`` layer classes inherit from\n ``torch.nn.Module``.\n- A model will have an ``__init__()`` function, where it instantiates\n its layers, and loads any data artifacts it might\n need (e.g., an NLP model might load a vocabulary).\n- A model will have a ``forward()`` function. This is where the actual\n computation happens: An input is passed through the network layers\n and various functions to generate an output.\n- Other than that, you can build out your model class like any other\n Python class, adding whatever properties and methods you need to\n support your model\u2019s computation.\n\nLet\u2019s instantiate this object and run a sample input through it.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"net = LeNet()\nprint(net) # what does the object tell us about itself?\n\ninput = torch.rand(1, 1, 32, 32) # stand-in for a 32x32 black & white image\nprint('\\nImage batch shape:')\nprint(input.shape)\n\noutput = net(input) # we don't call forward() directly\nprint('\\nRaw output:')\nprint(output)\nprint(output.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are a few important things happening above:\n\nFirst, we instantiate the ``LeNet`` class, and we print the ``net``\nobject. A subclass of ``torch.nn.Module`` will report the layers it has\ncreated and their shapes and parameters. This can provide a handy\noverview of a model if you want to get the gist of its processing.\n\nBelow that, we create a dummy input representing a 32x32 image with 1\ncolor channel. Normally, you would load an image tile and convert it to\na tensor of this shape.\n\nYou may have noticed an extra dimension to our tensor - the *batch\ndimension.* PyTorch models assume they are working on *batches* of data\n- for example, a batch of 16 of our image tiles would have the shape\n``(16, 1, 32, 32)``. Since we\u2019re only using one image, we create a batch\nof 1 with shape ``(1, 1, 32, 32)``.\n\nWe ask the model for an inference by calling it like a function:\n``net(input)``. The output of this call represents the model\u2019s\nconfidence that the input represents a particular digit. (Since this\ninstance of the model hasn\u2019t learned anything yet, we shouldn\u2019t expect\nto see any signal in the output.) Looking at the shape of ``output``, we\ncan see that it also has a batch dimension, the size of which should\nalways match the input batch dimension. If we had passed in an input\nbatch of 16 instances, ``output`` would have a shape of ``(16, 10)``.\n\n## Datasets and Dataloaders\n\nFollow along with the video beginning at [14:00](https://www.youtube.com/watch?v=IC0_FRiX-sw&t=840s)_.\n\nBelow, we\u2019re going to demonstrate using one of the ready-to-download,\nopen-access datasets from TorchVision, how to transform the images for\nconsumption by your model, and how to use the DataLoader to feed batches\nof data to your model.\n\nThe first thing we need to do is transform our incoming images into a\nPyTorch tensor.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#%matplotlib inline\n\nimport torch\nimport torchvision\nimport torchvision.transforms as transforms\n\ntransform = transforms.Compose(\n [transforms.ToTensor(),\n transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we specify two transformations for our input:\n\n- ``transforms.ToTensor()`` converts images loaded by Pillow into \n PyTorch tensors.\n- ``transforms.Normalize()`` adjusts the values of the tensor so\n that their average is zero and their standard deviation is 1.0. Most\n activation functions have their strongest gradients around x = 0, so\n centering our data there can speed learning.\n The values passed to the transform are the means (first tuple) and the\n standard deviations (second tuple) of the rgb values of the images in\n the dataset. You can calculate these values yourself by running these\n few lines of code:\n ```\n from torch.utils.data import ConcatDataset\n transform = transforms.Compose([transforms.ToTensor()])\n trainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n download=True, transform=transform)\n\n #stack all train images together into a tensor of shape \n #(50000, 3, 32, 32)\n x = torch.stack([sample[0] for sample in ConcatDataset([trainset])])\n\n #get the mean of each channel \n mean = torch.mean(x, dim=(0,2,3)) #tensor([0.4914, 0.4822, 0.4465])\n std = torch.std(x, dim=(0,2,3)) #tensor([0.2470, 0.2435, 0.2616]) \n\n ``` \n\nThere are many more transforms available, including cropping, centering,\nrotation, and reflection.\n\nNext, we\u2019ll create an instance of the CIFAR10 dataset. This is a set of\n32x32 color image tiles representing 10 classes of objects: 6 of animals\n(bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane,\nautomobile, ship, truck):\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"trainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n download=True, transform=transform)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you run the cell above, it may take a little time for the \n dataset to download.