❗ANNOUNCEMENT: Security Changes❗¶
TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control
TorchServe¶
TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torchscripted models.
Basic Features¶
Serving Quick Start - Basic server usage tutorial
Model Archive Quick Start - Tutorial that shows you how to package a model archive file.
Installation - Installation procedures
Model loading - How to load a model in TorchServe?
Serving Models - Explains how to use TorchServe
REST API - Specification on the API endpoint for TorchServe
gRPC API - TorchServe supports gRPC APIs for both inference and management calls
Packaging Model Archive - Explains how to package model archive file, use
model-archiver
.Inference API - How to check for the health of a deployed model and get inferences
Management API - How to manage and scale models
Logging - How to configure logging
Metrics - How to configure metrics
Prometheus and Grafana metrics - How to configure metrics API with Prometheus formatted metrics in a Grafana dashboard
Captum Explanations - Built in support for Captum explanations for both text and images
Batch inference with TorchServe - How to create and serve a model with batch inference in TorchServe
Workflows - How to create workflows to compose Pytorch models and Python functions in sequential and parallel pipelines
Default Handlers¶
Image Classifier - This handler takes an image and returns the name of object in that image
Text Classifier - This handler takes a text (string) as input and returns the classification text based on the model vocabulary
Object Detector - This handler takes an image and returns list of detected classes and bounding boxes respectively
Image Segmenter- This handler takes an image and returns output shape as [CL H W], CL - number of classes, H - height and W - width
Examples¶
Deploying LLMs - How to easily deploy LLMs using TorchServe
HuggingFace Language Model - This handler takes an input sentence and can return sequence classifications, token classifications or Q&A answers
Multi Modal Framework - Build and deploy a classifier that combines text, audio and video input data
Model Zoo - List of pre-trained model archives ready to be served for inference with TorchServe.
Examples - Many examples of how to package and deploy models with TorchServe
Workflow Examples - Examples of how to compose models in a workflow with TorchServe
Resnet50 HPU compile - An example of how to run the model in compile mode with the HPU device
Advanced Features¶
Advanced configuration - Describes advanced TorchServe configurations.
A/B test models - A/B test your models for regressions before shipping them to production
Custom Service - Describes how to develop custom inference services.
Encrypted model serving - S3 server side model encryption via KMS
Snapshot serialization - Serialize model artifacts to AWS Dynamo DB
Benchmarking and Profiling - Use JMeter or Apache Bench to benchmark your models and TorchServe itself
TorchServe on Kubernetes - Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service
mlflow-torchserve - Deploy mlflow pipeline models into TorchServe
Kubeflow pipelines - Kubeflow pipelines and Google Vertex AI Managed pipelines
NVIDIA MPS - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU