import torch model = torch.hub.load('pytorch/vision:v0.9.0', 'inception_v3', pretrained=True) model.eval()
All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB images of shape
(3 x H x W), where
W are expected to be at least
The images have to be loaded in to a range of
[0, 1] and then normalized using
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225].
Here’s a sample execution.
# Download an example image from the pytorch website import urllib url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg") try: urllib.URLopener().retrieve(url, filename) except: urllib.request.urlretrieve(url, filename)
# sample execution (requires torchvision) from PIL import Image from torchvision import transforms input_image = Image.open(filename) preprocess = transforms.Compose([ transforms.Resize(299), transforms.CenterCrop(299), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model # move the input and model to GPU for speed if available if torch.cuda.is_available(): input_batch = input_batch.to('cuda') model.to('cuda') with torch.no_grad(): output = model(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes print(output) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. probabilities = torch.nn.functional.softmax(output, dim=0) print(probabilities)
# Download ImageNet labels !wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt
# Read the categories with open("imagenet_classes.txt", "r") as f: categories = [s.strip() for s in f.readlines()] # Show top categories per image top5_prob, top5_catid = torch.topk(probabilities, 5) for i in range(top5_prob.size(0)): print(categories[top5_catid[i]], top5_prob[i].item())
Inception v3: Based on the exploration of ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.
The 1-crop error rates on the imagenet dataset with the pretrained model are listed below.
|Model structure||Top-1 error||Top-5 error|