• Docs >
  • TorchServe GenAI use cases and showcase
Shortcuts

TorchServe GenAI use cases and showcase

This document shows interesting usecases with TorchServe for Gen AI deployments.

Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton

In this blog, we show how to deploy a RAG Endpoint using TorchServe, increase throughput using torch.compile and improve the response generated by the Llama Endpoint. We also show how the RAG endpoint can be deployed on CPU using AWS Graviton, while the Llama endpoint is still deployed on a GPU. This kind of microservices-based RAG solution efficiently utilizes compute resources, resulting in potential cost savings for customers.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources