TorchServe GenAI use cases and showcase¶
This document shows interesting usecases with TorchServe for Gen AI deployments.
Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton¶
In this blog, we show how to deploy a RAG Endpoint using TorchServe, increase throughput using torch.compile
and improve the response generated by the Llama Endpoint. We also show how the RAG endpoint can be deployed on CPU using AWS Graviton, while the Llama endpoint is still deployed on a GPU. This kind of microservices-based RAG solution efficiently utilizes compute resources, resulting in potential cost savings for customers.
Multi-Image Generation Streamlit App: Chaining Llama & Stable Diffusion using TorchServe, torch.compile & OpenVINO¶
This Multi-Image Generation Streamlit app is designed to generate multiple images based on a provided text prompt. Instead of using Stable Diffusion directly, this app chains Llama and Stable Diffusion to enhance the image generation process. This multi-image generation use case exemplifies the powerful synergy of cutting-edge AI technologies: TorchServe, OpenVINO, Torch.compile, Meta-Llama, and Stable Diffusion.