Shortcuts

torchft

This repository implements primitives and E2E solutions for doing a per-step fault tolerance so you can keep training if errors occur without interrupting the entire training job.

GETTING STARTED? See Install and Usage in the README.

License

torchft is BSD 3-Clause licensed. See LICENSE for more details.

Copyright © Meta Platforms, Inc

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources