.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/sampling.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_sampling.py: ========================= How to sample video clips ========================= In this example, we'll learn how to sample video :term:`clips` from a video. A clip generally denotes a sequence or batch of frames, and is typically passed as input to video models. .. GENERATED FROM PYTHON SOURCE LINES 18-21 First, a bit of boilerplate: we'll download a video from the web, and define a plotting utility. You can ignore that part and jump right below to :ref:`sampling_tuto_start`. .. GENERATED FROM PYTHON SOURCE LINES 21-55 .. code-block:: Python from typing import Optional import torch import requests # Video source: https://www.pexels.com/video/dog-eating-854132/ # License: CC0. Author: Coverr. url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4" response = requests.get(url, headers={"User-Agent": ""}) if response.status_code != 200: raise RuntimeError(f"Failed to download video. {response.status_code = }.") raw_video_bytes = response.content def plot(frames: torch.Tensor, title : Optional[str] = None): try: from torchvision.utils import make_grid from torchvision.transforms.v2.functional import to_pil_image import matplotlib.pyplot as plt except ImportError: print("Cannot plot, please run `pip install torchvision matplotlib`") return plt.rcParams["savefig.bbox"] = 'tight' fig, ax = plt.subplots() ax.imshow(to_pil_image(make_grid(frames))) ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[]) if title is not None: ax.set_title(title) plt.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 56-65 .. _sampling_tuto_start: Creating a decoder ------------------ Sampling clips from a video always starts by creating a :class:`~torchcodec.decoders.VideoDecoder` object. If you're not already familiar with :class:`~torchcodec.decoders.VideoDecoder`, take a quick look at: :ref:`sphx_glr_generated_examples_basic_example.py`. .. GENERATED FROM PYTHON SOURCE LINES 65-70 .. code-block:: Python from torchcodec.decoders import VideoDecoder # You can also pass a path to a local file! decoder = VideoDecoder(raw_video_bytes) .. GENERATED FROM PYTHON SOURCE LINES 71-78 Sampling basics --------------- We can now use our decoder to sample clips. Let's first look at a simple example: all other samplers follow similar APIs and principles. We'll use :func:`~torchcodec.samplers.clips_at_random_indices` to sample clips that start at random indices. .. GENERATED FROM PYTHON SOURCE LINES 78-94 .. code-block:: Python from torchcodec.samplers import clips_at_random_indices # The samplers RNG is controlled by pytorch's RNG. We set a seed for this # tutorial to be reproducible across runs, but note that hard-coding a seed for # a training run is generally not recommended. torch.manual_seed(0) clips = clips_at_random_indices( decoder, num_clips=5, num_frames_per_clip=4, num_indices_between_frames=3, ) clips .. rst-class:: sphx-glr-script-out .. code-block:: none FrameBatch: data (shape): torch.Size([5, 4, 3, 360, 640]) pts_seconds: tensor([[11.3600, 11.4800, 11.6000, 11.7200], [10.2000, 10.3200, 10.4400, 10.5600], [ 9.8000, 9.9200, 10.0400, 10.1600], [ 9.6000, 9.7200, 9.8400, 9.9600], [ 8.4400, 8.5600, 8.6800, 8.8000]], dtype=torch.float64) duration_seconds: tensor([[0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 95-109 The output of the sampler is a sequence of clips, represented as :class:`~torchcodec.FrameBatch` object. In this object, we have different fields: - ``data``: a 5D uint8 tensor representing the frame data. Its shape is (num_clips, num_frames_per_clip, ...) where ... is either (C, H, W) or (H, W, C), depending on the ``dimension_order`` parameter of the :class:`~torchcodec.decoders.VideoDecoder`. This is typically what would get passed to the model. - ``pts_seconds``: a 2D float tensor of shape (num_clips, num_frames_per_clip) giving the starting timestamps of each frame within each clip, in seconds. - ``duration_seconds``: a 2D float tensor of shape (num_clips, num_frames_per_clip) giving the duration of each frame within each clip, in seconds. .. GENERATED FROM PYTHON SOURCE LINES 109-112 .. code-block:: Python plot(clips[0].data) .. image-sg:: /generated_examples/images/sphx_glr_sampling_001.png :alt: sampling :srcset: /generated_examples/images/sphx_glr_sampling_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 113-120 Indexing and manipulating clips ------------------------------- Clips are :class:`~torchcodec.FrameBatch` objects, and they support native pytorch indexing semantics (including fancy indexing). This makes it easy to filter clips based on a given criteria. For example, from the clips above we can easily filter out those who start *after* a specific timestamp: .. GENERATED FROM PYTHON SOURCE LINES 120-123 .. code-block:: Python clip_starts = clips.pts_seconds[:, 0] clip_starts .. rst-class:: sphx-glr-script-out .. code-block:: none tensor([11.3600, 10.2000, 9.8000, 9.6000, 8.4400], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 124-127 .. code-block:: Python clips_starting_after_five_seconds = clips[clip_starts > 5] clips_starting_after_five_seconds .. rst-class:: sphx-glr-script-out .. code-block:: none FrameBatch: data (shape): torch.Size([5, 4, 3, 360, 640]) pts_seconds: tensor([[11.3600, 11.4800, 11.6000, 11.7200], [10.2000, 10.3200, 10.4400, 10.5600], [ 9.8000, 9.9200, 10.0400, 10.1600], [ 9.6000, 9.7200, 9.8400, 9.9600], [ 8.4400, 8.5600, 8.6800, 8.8000]], dtype=torch.float64) duration_seconds: tensor([[0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 128-131 .. code-block:: Python every_other_clip = clips[::2] every_other_clip .. rst-class:: sphx-glr-script-out .. code-block:: none FrameBatch: data (shape): torch.Size([3, 4, 3, 360, 640]) pts_seconds: tensor([[11.3600, 11.4800, 11.6000, 11.7200], [ 9.8000, 9.9200, 10.0400, 10.1600], [ 8.4400, 8.5600, 8.6800, 8.8000]], dtype=torch.float64) duration_seconds: tensor([[0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 132-167 .. note:: A more natural and efficient way to get clips after a given timestamp is to rely on the sampling range parameters, which we'll cover later in :ref:`sampling_range`. Index-based and Time-based samplers ----------------------------------- So far we've used :func:`~torchcodec.samplers.clips_at_random_indices`. Torchcodec support additional samplers, which fall under two main categories: Index-based samplers: - :func:`~torchcodec.samplers.clips_at_random_indices` - :func:`~torchcodec.samplers.clips_at_regular_indices` Time-based samplers: - :func:`~torchcodec.samplers.clips_at_random_timestamps` - :func:`~torchcodec.samplers.clips_at_regular_timestamps` All these samplers follow similar APIs and the time-based samplers have analogous parameters to the index-based ones. Both samplers types generally offer comparable performance in terms speed. .. note:: Is it better to use a time-based sampler or an index-based sampler? The index-based samplers have arguably slightly simpler APIs and their behavior is possibly simpler to understand and control, because of the discrete nature of indices. For videos with constant fps, an index-based sampler behaves exactly the same as a time-based samplers. For videos with variable fps however (as is often the case), relying on indices may under/over sample some regions in the video, which may lead to undersirable side effects when training a model. Using a time-based sampler ensures uniform sampling caracteristics along the time-dimension. .. GENERATED FROM PYTHON SOURCE LINES 170-185 .. _sampling_range: Advanced parameters: sampling range ----------------------------------- Sometimes, we may not want to sample clips from an entire video. We may only be interested in clips that start within a smaller interval. In samplers, the ``sampling_range_start`` and ``sampling_range_end`` parmeters control the sampling range: they define where we allow clips to *start*. There are two important things to keep in mind: - ``sampling_range_end`` is an *open* upper-bound: clips may only start within [sampling_range_start, sampling_range_end). - Because these parameter define where a clip can start, clips may contain frames *after* ``sampling_range_end``! .. GENERATED FROM PYTHON SOURCE LINES 185-198 .. code-block:: Python from torchcodec.samplers import clips_at_regular_timestamps clips = clips_at_regular_timestamps( decoder, seconds_between_clip_starts=1, num_frames_per_clip=4, seconds_between_frames=0.5, sampling_range_start=2, sampling_range_end=5 ) clips .. rst-class:: sphx-glr-script-out .. code-block:: none FrameBatch: data (shape): torch.Size([3, 4, 3, 360, 640]) pts_seconds: tensor([[2.0000, 2.4800, 3.0000, 3.4800], [3.0000, 3.4800, 4.0000, 4.4800], [4.0000, 4.4800, 5.0000, 5.4800]], dtype=torch.float64) duration_seconds: tensor([[0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400], [0.0400, 0.0400, 0.0400, 0.0400]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 199-207 Advanced parameters: policy --------------------------- Depending on the length or duration of the video and on the sampling parameters, the sampler may try to sample frames *beyond* the end of the video. The ``policy`` parameter defines how such invalid frames should be replaced with valid frames. .. GENERATED FROM PYTHON SOURCE LINES 207-212 .. code-block:: Python from torchcodec.samplers import clips_at_random_timestamps end_of_video = decoder.metadata.end_stream_seconds print(f"{end_of_video = }") .. rst-class:: sphx-glr-script-out .. code-block:: none end_of_video = 13.8 .. GENERATED FROM PYTHON SOURCE LINES 213-225 .. code-block:: Python torch.manual_seed(0) clips = clips_at_random_timestamps( decoder, num_clips=1, num_frames_per_clip=5, seconds_between_frames=0.4, sampling_range_start=end_of_video - 1, sampling_range_end=end_of_video, policy="repeat_last", ) clips.pts_seconds .. rst-class:: sphx-glr-script-out .. code-block:: none tensor([[13.2800, 13.6800, 13.6800, 13.6800, 13.6800]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 226-233 We see above that the end of the video is at 13.8s. The sampler tries to sample frames at timestamps [13.28, 13.68, 14.08, ...] but 14.08 is an invalid timestamp, beyond the end video. With the "repeat_last" policy, which is the default, the sampler simply repeats the last frame at 13.68 seconds to construct the clip. An alternative policy is "wrap": the sampler then wraps-around the clip and repeats the first few valid frames as necessary: .. GENERATED FROM PYTHON SOURCE LINES 233-246 .. code-block:: Python torch.manual_seed(0) clips = clips_at_random_timestamps( decoder, num_clips=1, num_frames_per_clip=5, seconds_between_frames=0.4, sampling_range_start=end_of_video - 1, sampling_range_end=end_of_video, policy="wrap", ) clips.pts_seconds .. rst-class:: sphx-glr-script-out .. code-block:: none tensor([[13.2800, 13.6800, 13.2800, 13.6800, 13.2800]], dtype=torch.float64) .. GENERATED FROM PYTHON SOURCE LINES 247-252 By default, the value of ``sampling_range_end`` is automatically set such that the sampler *doesn't* try to sample frames beyond the end of the video: the default value ensures that clips start early enough before the end. This means that by default, the policy parameter rarely comes into action, and most users probably don't need to worry too much about it. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.620 seconds) .. _sphx_glr_download_generated_examples_sampling.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: sampling.ipynb <sampling.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: sampling.py <sampling.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: sampling.zip <sampling.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_