read_video¶
- torchvision.io.read_video(filename: str, start_pts: Union[float, Fraction] = 0, end_pts: Optional[Union[float, Fraction]] = None, pts_unit: str = 'pts', output_format: str = 'THWC') Tuple[Tensor, Tensor, Dict[str, Any]] [source]¶
Reads a video from a file, returning both the video frames and the audio frames
Warning
In the near future, we intend to centralize PyTorch’s video decoding capabilities within the torchcodec project. We encourage you to try it out and share your feedback, as the torchvision video decoders will eventually be deprecated.
- Parameters:
filename (str) – path to the video file. If using the pyav backend, this can be whatever
av.open
accepts.start_pts (python:int if pts_unit = 'pts', python:float / Fraction if pts_unit = 'sec', optional) – The start presentation time of the video
end_pts (python:int if pts_unit = 'pts', python:float / Fraction if pts_unit = 'sec', optional) – The end presentation time
pts_unit (str, optional) – unit in which start_pts and end_pts values will be interpreted, either ‘pts’ or ‘sec’. Defaults to ‘pts’.
output_format (str, optional) – The format of the output video tensors. Can be either “THWC” (default) or “TCHW”.
- Returns:
the T video frames aframes (Tensor[K, L]): the audio frames, where K is the number of channels and L is the number of points info (Dict): metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int)
- Return type:
vframes (Tensor[T, H, W, C] or Tensor[T, C, H, W])
Examples using
read_video
:Optical Flow: Predicting movement with the RAFT model
Optical Flow: Predicting movement with the RAFT model