Shortcuts

VideoDecoder

class torchcodec.decoders.VideoDecoder(source: Union[str, Path, bytes, Tensor], *, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, device]] = 'cpu')[source]

A single-stream video decoder.

This decoder always performs a scan of the video.

Parameters:
  • source (str, Pathlib.path, torch.Tensor, or bytes) –

    The source of the video.

    • If str or Pathlib.path: a path to a local video file.

    • If bytes object or torch.Tensor: the raw encoded video data.

  • stream_index (int, optional) – Specifies which stream in the video to decode frames from. Note that this index is absolute across all media types. If left unspecified, then the best stream is used.

  • dimension_order (str, optional) –

    The dimension order of the decoded frames. This can be either “NCHW” (default) or “NHWC”, where N is the batch size, C is the number of channels, H is the height, and W is the width of the frames. .. note:

    Frames are natively decoded in NHWC format by the underlying
    FFmpeg implementation. Converting those into NCHW format is a
    cheap no-copy operation that allows these frames to be
    transformed using the `torchvision transforms
    <https://pytorch.org/vision/stable/transforms.html>`_.
    

  • num_ffmpeg_threads (int, optional) – The number of threads to use for decoding. Use 1 for single-threaded decoding which may be best if you are running multiple instances of VideoDecoder in parallel. Use a higher number for multi-threaded decoding which is best if you are running a single instance of VideoDecoder. Passing 0 lets FFmpeg decide on the number of threads. Default: 1.

  • device (str or torch.device, optional) – The device to use for decoding. Default: “cpu”.

Variables:
  • metadata (VideoStreamMetadata) – Metadata of the video stream.

  • stream_index (int) – The stream index that this decoder is retrieving frames from. If a stream index was provided at initialization, this is the same value. If it was left unspecified, this is the best stream.

Examples using VideoDecoder:

Accelerated video decoding on GPUs with CUDA and NVDEC

Accelerated video decoding on GPUs with CUDA and NVDEC

Decoding a video with VideoDecoder

Decoding a video with VideoDecoder

How to sample video clips

How to sample video clips
__getitem__(key: Union[Integral, slice]) Tensor[source]

Return frame or frames as tensors, at the given index or range.

Parameters:

key (int or slice) – The index or range of frame(s) to retrieve.

Returns:

The frame or frames at the given index or range.

Return type:

torch.Tensor

get_frame_at(index: int) Frame[source]

Return a single frame at the given index.

Parameters:

index (int) – The index of the frame to retrieve.

Returns:

The frame at the given index.

Return type:

Frame

get_frame_played_at(seconds: float) Frame[source]

Return a single frame played at the given timestamp in seconds.

Parameters:

seconds (float) – The time stamp in seconds when the frame is played.

Returns:

The frame that is played at seconds.

Return type:

Frame

get_frames_at(indices: list[int]) FrameBatch[source]

Return frames at the given indices.

Note

Calling this method is more efficient that repeated individual calls to get_frame_at(). This method makes sure not to decode the same frame twice, and also avoids “backwards seek” operations, which are slow.

Parameters:

indices (list of int) – The indices of the frames to retrieve.

Returns:

The frames at the given indices.

Return type:

FrameBatch

get_frames_in_range(start: int, stop: int, step: int = 1) FrameBatch[source]

Return multiple frames at the given index range.

Frames are in [start, stop).

Parameters:
  • start (int) – Index of the first frame to retrieve.

  • stop (int) – End of indexing range (exclusive, as per Python conventions).

  • step (int, optional) – Step size between frames. Default: 1.

Returns:

The frames within the specified range.

Return type:

FrameBatch

get_frames_played_at(seconds: list[float]) FrameBatch[source]

Return frames played at the given timestamps in seconds.

Note

Calling this method is more efficient that repeated individual calls to get_frame_played_at(). This method makes sure not to decode the same frame twice, and also avoids “backwards seek” operations, which are slow.

Parameters:

seconds (list of float) – The timestamps in seconds when the frames are played.

Returns:

The frames that are played at seconds.

Return type:

FrameBatch

get_frames_played_in_range(start_seconds: float, stop_seconds: float) FrameBatch[source]

Returns multiple frames in the given range.

Frames are in the half open range [start_seconds, stop_seconds). Each returned frame’s pts, in seconds, is inside of the half open range.

Parameters:
  • start_seconds (float) – Time, in seconds, of the start of the range.

  • stop_seconds (float) – Time, in seconds, of the end of the range. As a half open range, the end is excluded.

Returns:

The frames within the specified range.

Return type:

FrameBatch

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources