The S3D model is based on the Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification paper.
The following model builders can be used to instantiate an S3D model, with or
without pre-trained weights. All the model builders internally rely on the
torchvision.models.video.S3D base class. Please refer to the source
more details about this class.
Construct Separable 3D CNN model.