Shortcuts

Video MViT

The MViT model is based on the MViTv2: Improved Multiscale Vision Transformers for Classification and Detection and Multiscale Vision Transformers papers.

Model builders

The following model builders can be used to instantiate a MViT v1 or v2 model, with or without pre-trained weights. All the model builders internally rely on the torchvision.models.video.MViT base class. Please refer to the source code for more details about this class.

mvit_v1_b(*[, weights, progress])

Constructs a base MViTV1 architecture from Multiscale Vision Transformers.

mvit_v2_s(*[, weights, progress])

Constructs a small MViTV2 architecture from Multiscale Vision Transformers and MViTv2: Improved Multiscale Vision Transformers for Classification and Detection.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources