The VideoResNet model is based on the A Closer Look at Spatiotemporal Convolutions for Action Recognition paper.
The video module is in Beta stage, and backward compatibility is not guaranteed.
The following model builders can be used to instantiate a VideoResNet model, with or
without pre-trained weights. All the model builders internally rely on the
torchvision.models.video.resnet.VideoResNet base class. Please refer to the source
more details about this class.
Construct 18 layer Resnet3D model.
Construct 18 layer Mixed Convolution network as in
Construct 18 layer deep R(2+1)D network as in