.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/approximate_mode.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_approximate_mode.py: =================================================================== Exact vs Approximate seek mode: Performance and accuracy comparison =================================================================== In this example, we will describe the ``seek_mode`` parameter of the :class:`~torchcodec.decoders.VideoDecoder` class. This parameter offers a trade-off between the speed of the :class:`~torchcodec.decoders.VideoDecoder` creation, against the seeking accuracy of the retreived frames (i.e. in approximate mode, requesting the ``i``'th frame may not necessarily return frame ``i``). .. GENERATED FROM PYTHON SOURCE LINES 21-25 First, a bit of boilerplate: we'll download a short video from the web, and use the ffmpeg CLI to repeat it 100 times. We'll end up with two videos: a short video of approximately 13s and a long one of about 20 mins. You can ignore that part and jump right below to :ref:`perf_creation`. .. GENERATED FROM PYTHON SOURCE LINES 25-63 .. code-block:: Python import torch import requests import tempfile from pathlib import Path import shutil import subprocess from time import perf_counter_ns # Video source: https://www.pexels.com/video/dog-eating-854132/ # License: CC0. Author: Coverr. url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4" response = requests.get(url, headers={"User-Agent": ""}) if response.status_code != 200: raise RuntimeError(f"Failed to download video. {response.status_code = }.") temp_dir = tempfile.mkdtemp() short_video_path = Path(temp_dir) / "short_video.mp4" with open(short_video_path, 'wb') as f: for chunk in response.iter_content(): f.write(chunk) long_video_path = Path(temp_dir) / "long_video.mp4" ffmpeg_command = [ "ffmpeg", "-stream_loop", "99", # repeat video 100 times "-i", f"{short_video_path}", "-c", "copy", f"{long_video_path}" ] subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) from torchcodec.decoders import VideoDecoder print(f"Short video duration: {VideoDecoder(short_video_path).metadata.duration_seconds} seconds") print(f"Long video duration: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} minutes") .. rst-class:: sphx-glr-script-out .. code-block:: none Short video duration: 13.8 seconds Long video duration: 23.0 minutes .. GENERATED FROM PYTHON SOURCE LINES 64-72 .. _perf_creation: Performance: ``VideoDecoder`` creation -------------------------------------- In terms of performance, the ``seek_mode`` parameter ultimately affects the **creation** of a :class:`~torchcodec.decoders.VideoDecoder` object. The longer the video, the higher the performance gain. .. GENERATED FROM PYTHON SOURCE LINES 72-102 .. code-block:: Python def bench(f, average_over=50, warmup=2, **f_kwargs): for _ in range(warmup): f(**f_kwargs) times = [] for _ in range(average_over): start = perf_counter_ns() f(**f_kwargs) end = perf_counter_ns() times.append(end - start) times = torch.tensor(times) * 1e-6 # ns to ms std = times.std().item() med = times.median().item() print(f"{med = :.2f}ms +- {std:.2f}") print("Creating a VideoDecoder object with seek_mode='exact' on a short video:") bench(VideoDecoder, source=short_video_path, seek_mode="exact") print("Creating a VideoDecoder object with seek_mode='approximate' on a short video:") bench(VideoDecoder, source=short_video_path, seek_mode="approximate") print() print("Creating a VideoDecoder object with seek_mode='exact' on a long video:") bench(VideoDecoder, source=long_video_path, seek_mode="exact") print("Creating a VideoDecoder object with seek_mode='approximate' on a long video:") bench(VideoDecoder, source=long_video_path, seek_mode="approximate") .. rst-class:: sphx-glr-script-out .. code-block:: none Creating a VideoDecoder object with seek_mode='exact' on a short video: med = 8.04ms +- 0.03 Creating a VideoDecoder object with seek_mode='approximate' on a short video: med = 7.09ms +- 0.10 Creating a VideoDecoder object with seek_mode='exact' on a long video: med = 114.68ms +- 0.73 Creating a VideoDecoder object with seek_mode='approximate' on a long video: med = 10.52ms +- 0.03 .. GENERATED FROM PYTHON SOURCE LINES 103-113 Performance: frame decoding and clip sampling --------------------------------------------- Strictly speaking the ``seek_mode`` parameter only affects the performance of the :class:`~torchcodec.decoders.VideoDecoder` creation. It does not have a direct effect on the performance of frame decoding or sampling. **However**, because frame decoding and sampling patterns typically involve the creation of the :class:`~torchcodec.decoders.VideoDecoder` (one per video), ``seek_mode`` may very well end up affecting the performance of decoding and samplers. For example: .. GENERATED FROM PYTHON SOURCE LINES 113-133 .. code-block:: Python from torchcodec import samplers def sample_clips(seek_mode): return samplers.clips_at_random_indices( decoder=VideoDecoder( source=long_video_path, seek_mode=seek_mode ), num_clips=5, num_frames_per_clip=2, ) print("Sampling clips with seek_mode='exact':") bench(sample_clips, seek_mode="exact") print("Sampling clips with seek_mode='approximate':") bench(sample_clips, seek_mode="approximate") .. rst-class:: sphx-glr-script-out .. code-block:: none Sampling clips with seek_mode='exact': med = 299.06ms +- 32.15 Sampling clips with seek_mode='approximate': med = 183.01ms +- 44.95 .. GENERATED FROM PYTHON SOURCE LINES 134-145 Accuracy: Metadata and frame retrieval -------------------------------------- We've seen that using ``seek_mode="approximate"`` can significantly speed up the :class:`~torchcodec.decoders.VideoDecoder` creation. The price to pay for that is that seeking won't always be as accurate as with ``seek_mode="exact"``. It can also affect the exactness of the metadata. However, in a lot of cases, you'll find that there will be no accuracy difference between the two modes, which means that ``seek_mode="approximate"`` is a net win: .. GENERATED FROM PYTHON SOURCE LINES 145-161 .. code-block:: Python print("Metadata of short video with seek_mode='exact':") print(VideoDecoder(short_video_path, seek_mode="exact").metadata) print("Metadata of short video with seek_mode='approximate':") print(VideoDecoder(short_video_path, seek_mode="approximate").metadata) exact_decoder = VideoDecoder(short_video_path, seek_mode="exact") approx_decoder = VideoDecoder(short_video_path, seek_mode="approximate") for i in range(len(exact_decoder)): torch.testing.assert_close( exact_decoder.get_frame_at(i).data, approx_decoder.get_frame_at(i).data, atol=0, rtol=0, ) print("Frame seeking is the same for this video!") .. rst-class:: sphx-glr-script-out .. code-block:: none Metadata of short video with seek_mode='exact': VideoStreamMetadata: num_frames: 345 duration_seconds: 13.8 average_fps: 25.0 duration_seconds_from_header: 13.8 bit_rate: 505790.0 num_frames_from_header: 345 num_frames_from_content: 345 begin_stream_seconds_from_content: 0.0 end_stream_seconds_from_content: 13.8 codec: h264 width: 640 height: 360 average_fps_from_header: 25.0 stream_index: 0 Metadata of short video with seek_mode='approximate': VideoStreamMetadata: num_frames: 345 duration_seconds: 13.8 average_fps: 25.0 duration_seconds_from_header: 13.8 bit_rate: 505790.0 num_frames_from_header: 345 num_frames_from_content: None begin_stream_seconds_from_content: None end_stream_seconds_from_content: None codec: h264 width: 640 height: 360 average_fps_from_header: 25.0 stream_index: 0 Frame seeking is the same for this video! .. GENERATED FROM PYTHON SOURCE LINES 162-185 What is this doing under the hood? ---------------------------------- With ``seek_mode="exact"``, the :class:`~torchcodec.decoders.VideoDecoder` performs a :term:`scan` when it is instantiated. The scan doesn't involve decoding, but processes an entire file to infer more accurate metadata (like duration), and also builds an internal index of frames and key-frames. This internal index is potentially more accurate than the one in the file's headers, which leads to more accurate seeking behavior. Without the scan, TorchCodec relies only on the metadata contained in the file, which may not always be as accurate. Which mode should I use? ------------------------ The general rule of thumb is as follows: - If you really care about exactness of frame seeking, use "exact". - If you can sacrifice exactness of seeking for speed, which is usually the case when doing clip sampling, use "approximate". - If your videos don't have variable framerate and their metadata is correct, then "approximate" mode is a net win: it will be just as accurate as the "exact" mode while still being significantly faster. .. GENERATED FROM PYTHON SOURCE LINES 187-188 .. code-block:: Python shutil.rmtree(temp_dir) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 35.689 seconds) .. _sphx_glr_download_generated_examples_approximate_mode.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: approximate_mode.ipynb <approximate_mode.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: approximate_mode.py <approximate_mode.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: approximate_mode.zip <approximate_mode.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_