.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/nvenc_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_nvenc_tutorial.py: Accelerated video encoding with NVENC ===================================== .. _nvenc_tutorial: **Author**: `Moto Hira `__ This tutorial shows how to use NVIDIA’s hardware video encoder (NVENC) with TorchAudio, and how it improves the performance of video encoding. .. GENERATED FROM PYTHON SOURCE LINES 14-37 .. note:: This tutorial requires FFmpeg libraries compiled with HW acceleration enabled. Please refer to :ref:`Enabling GPU video decoder/encoder ` for how to build FFmpeg with HW acceleration. .. note:: Most modern GPUs have both HW decoder and encoder, but some highend GPUs like A100 and H100 do not have HW encoder. Please refer to the following for the availability and format coverage. https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new Attempting to use HW encoder on these GPUs fails with an error message like ``Generic error in an external library``. You can enable debug log with :py:func:`torchaudio.utils.ffmpeg_utils.set_log_level` to see more detailed error messages issued along the way. .. GENERATED FROM PYTHON SOURCE LINES 37-51 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) import io import time import matplotlib.pyplot as plt from IPython.display import Video from torchaudio.io import StreamReader, StreamWriter .. rst-class:: sphx-glr-script-out .. code-block:: none 2.4.0.dev20240508 2.2.0.dev20240509 .. GENERATED FROM PYTHON SOURCE LINES 52-58 Check the prerequisites ----------------------- First, we check that TorchAudio correctly detects FFmpeg libraries that support HW decoder/encoder. .. GENERATED FROM PYTHON SOURCE LINES 59-62 .. code-block:: default from torchaudio.utils import ffmpeg_utils .. GENERATED FROM PYTHON SOURCE LINES 64-69 .. code-block:: default print("FFmpeg Library versions:") for k, ver in ffmpeg_utils.get_versions().items(): print(f" {k}:\t{'.'.join(str(v) for v in ver)}") .. rst-class:: sphx-glr-script-out .. code-block:: none FFmpeg Library versions: libavcodec: 60.3.100 libavdevice: 60.1.100 libavfilter: 9.3.100 libavformat: 60.3.100 libavutil: 58.2.100 .. GENERATED FROM PYTHON SOURCE LINES 71-76 .. code-block:: default print("Available NVENC Encoders:") for k in ffmpeg_utils.get_video_encoders().keys(): if "nvenc" in k: print(f" - {k}") .. rst-class:: sphx-glr-script-out .. code-block:: none Available NVENC Encoders: - av1_nvenc - h264_nvenc - hevc_nvenc .. GENERATED FROM PYTHON SOURCE LINES 78-83 .. code-block:: default print("Avaialbe GPU:") print(torch.cuda.get_device_properties(0)) .. rst-class:: sphx-glr-script-out .. code-block:: none Avaialbe GPU: _CudaDeviceProperties(name='NVIDIA A10G', major=8, minor=6, total_memory=22502MB, multi_processor_count=80) .. GENERATED FROM PYTHON SOURCE LINES 84-87 We use the following helper function to generate test frame data. For the detail of synthetic video generation please refer to :ref:`StreamReader Advanced Usage `. .. GENERATED FROM PYTHON SOURCE LINES 87-98 .. code-block:: default def get_data(height, width, format="yuv444p", frame_rate=30000 / 1001, duration=4): src = f"testsrc2=rate={frame_rate}:size={width}x{height}:duration={duration}" s = StreamReader(src=src, format="lavfi") s.add_basic_video_stream(-1, format=format) s.process_all_packets() (video,) = s.pop_chunks() return video .. GENERATED FROM PYTHON SOURCE LINES 99-106 Encoding videos with NVENC -------------------------- To use HW video encoder, you need to specify the HW encoder when defining the output video stream by providing ``encoder`` option to :py:meth:`~torchaudio.io.StreamWriter.add_video_stream`. .. GENERATED FROM PYTHON SOURCE LINES 109-119 .. code-block:: default pict_config = { "height": 360, "width": 640, "frame_rate": 30000 / 1001, "format": "yuv444p", } frame_data = get_data(**pict_config) .. GENERATED FROM PYTHON SOURCE LINES 121-127 .. code-block:: default w = StreamWriter(io.BytesIO(), format="mp4") w.add_video_stream(**pict_config, encoder="h264_nvenc", encoder_format="yuv444p") with w.open(): w.write_video_chunk(0, frame_data) .. GENERATED FROM PYTHON SOURCE LINES 128-132 Similar to the HW decoder, by default, the encoder expects the frame data to be on CPU memory. To send data from CUDA memory, you need to specify ``hw_accel`` option. .. GENERATED FROM PYTHON SOURCE LINES 132-141 .. code-block:: default buffer = io.BytesIO() w = StreamWriter(buffer, format="mp4") w.add_video_stream(**pict_config, encoder="h264_nvenc", encoder_format="yuv444p", hw_accel="cuda:0") with w.open(): w.write_video_chunk(0, frame_data.to(torch.device("cuda:0"))) buffer.seek(0) video_cuda = buffer.read() .. GENERATED FROM PYTHON SOURCE LINES 143-146 .. code-block:: default Video(video_cuda, embed=True, mimetype="video/mp4") .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 147-157 Benchmark NVENC with StreamWriter --------------------------------- Now we compare the performance of software encoder and hardware encoder. Similar to the benchmark in NVDEC, we process the videos of different resolution, and measure the time it takes to encode them. We also measure the size of resulting video file. .. GENERATED FROM PYTHON SOURCE LINES 159-162 The following function encodes the given frames and measure the time it takes to encode and the size of the resulting video data. .. GENERATED FROM PYTHON SOURCE LINES 162-183 .. code-block:: default def test_encode(data, encoder, width, height, hw_accel=None, **config): assert data.is_cuda buffer = io.BytesIO() s = StreamWriter(buffer, format="mp4") s.add_video_stream(encoder=encoder, width=width, height=height, hw_accel=hw_accel, **config) with s.open(): t0 = time.monotonic() if hw_accel is None: data = data.to("cpu") s.write_video_chunk(0, data) elapsed = time.monotonic() - t0 size = buffer.tell() fps = len(data) / elapsed print(f" - Processed {len(data)} frames in {elapsed:.2f} seconds. ({fps:.2f} fps)") print(f" - Encoded data size: {size} bytes") return elapsed, size .. GENERATED FROM PYTHON SOURCE LINES 184-189 We conduct the tests for the following configurations - Software encoder with the number of threads 1, 4, 8 - Hardware encoder with and without ``hw_accel`` option. .. GENERATED FROM PYTHON SOURCE LINES 189-244 .. code-block:: default def run_tests(height, width, duration=4): # Generate the test data print(f"Testing resolution: {width}x{height}") pict_config = { "height": height, "width": width, "frame_rate": 30000 / 1001, "format": "yuv444p", } data = get_data(**pict_config, duration=duration) data = data.to(torch.device("cuda:0")) times = [] sizes = [] # Test software encoding encoder_config = { "encoder": "libx264", "encoder_format": "yuv444p", } for i, num_threads in enumerate([1, 4, 8]): print(f"* Software Encoder (num_threads={num_threads})") time_, size = test_encode( data, encoder_option={"threads": str(num_threads)}, **pict_config, **encoder_config, ) times.append(time_) if i == 0: sizes.append(size) # Test hardware encoding encoder_config = { "encoder": "h264_nvenc", "encoder_format": "yuv444p", "encoder_option": {"gpu": "0"}, } for i, hw_accel in enumerate([None, "cuda"]): print(f"* Hardware Encoder {'(CUDA frames)' if hw_accel else ''}") time_, size = test_encode( data, **pict_config, **encoder_config, hw_accel=hw_accel, ) times.append(time_) if i == 0: sizes.append(size) return times, sizes .. GENERATED FROM PYTHON SOURCE LINES 245-251 And we change the resolution of videos to see how these measurement change. 360P ---- .. GENERATED FROM PYTHON SOURCE LINES 251-254 .. code-block:: default time_360, size_360 = run_tests(360, 640) .. rst-class:: sphx-glr-script-out .. code-block:: none Testing resolution: 640x360 * Software Encoder (num_threads=1) - Processed 120 frames in 0.63 seconds. (190.48 fps) - Encoded data size: 381331 bytes * Software Encoder (num_threads=4) - Processed 120 frames in 0.23 seconds. (513.01 fps) - Encoded data size: 381307 bytes * Software Encoder (num_threads=8) - Processed 120 frames in 0.18 seconds. (675.01 fps) - Encoded data size: 390689 bytes * Hardware Encoder - Processed 120 frames in 0.05 seconds. (2264.01 fps) - Encoded data size: 1262979 bytes * Hardware Encoder (CUDA frames) - Processed 120 frames in 0.05 seconds. (2583.05 fps) - Encoded data size: 1262979 bytes .. GENERATED FROM PYTHON SOURCE LINES 255-258 720P ---- .. GENERATED FROM PYTHON SOURCE LINES 258-261 .. code-block:: default time_720, size_720 = run_tests(720, 1280) .. rst-class:: sphx-glr-script-out .. code-block:: none Testing resolution: 1280x720 * Software Encoder (num_threads=1) - Processed 120 frames in 2.23 seconds. (53.80 fps) - Encoded data size: 1335451 bytes * Software Encoder (num_threads=4) - Processed 120 frames in 0.81 seconds. (147.88 fps) - Encoded data size: 1336418 bytes * Software Encoder (num_threads=8) - Processed 120 frames in 0.66 seconds. (181.47 fps) - Encoded data size: 1344063 bytes * Hardware Encoder - Processed 120 frames in 0.25 seconds. (473.41 fps) - Encoded data size: 1358969 bytes * Hardware Encoder (CUDA frames) - Processed 120 frames in 0.15 seconds. (801.76 fps) - Encoded data size: 1358969 bytes .. GENERATED FROM PYTHON SOURCE LINES 262-265 1080P ----- .. GENERATED FROM PYTHON SOURCE LINES 265-268 .. code-block:: default time_1080, size_1080 = run_tests(1080, 1920) .. rst-class:: sphx-glr-script-out .. code-block:: none Testing resolution: 1920x1080 * Software Encoder (num_threads=1) - Processed 120 frames in 4.64 seconds. (25.88 fps) - Encoded data size: 2678241 bytes * Software Encoder (num_threads=4) - Processed 120 frames in 1.67 seconds. (71.99 fps) - Encoded data size: 2682028 bytes * Software Encoder (num_threads=8) - Processed 120 frames in 1.49 seconds. (80.41 fps) - Encoded data size: 2685086 bytes * Hardware Encoder - Processed 120 frames in 0.56 seconds. (215.63 fps) - Encoded data size: 1705900 bytes * Hardware Encoder (CUDA frames) - Processed 120 frames in 0.32 seconds. (370.97 fps) - Encoded data size: 1705900 bytes .. GENERATED FROM PYTHON SOURCE LINES 269-271 Now we plot the result. .. GENERATED FROM PYTHON SOURCE LINES 271-311 .. code-block:: default def plot(): fig, axes = plt.subplots(2, 1, sharex=True, figsize=[9.6, 7.2]) for items in zip(time_360, time_720, time_1080, "ov^X+"): axes[0].plot(items[:-1], marker=items[-1]) axes[0].grid(axis="both") axes[0].set_xticks([0, 1, 2], ["360p", "720p", "1080p"], visible=True) axes[0].tick_params(labeltop=False) axes[0].legend( [ "Software Encoding (threads=1)", "Software Encoding (threads=4)", "Software Encoding (threads=8)", "Hardware Encoding (CPU Tensor)", "Hardware Encoding (CUDA Tensor)", ] ) axes[0].set_title("Time to encode videos with different resolutions") axes[0].set_ylabel("Time [s]") for items in zip(size_360, size_720, size_1080, "v^"): axes[1].plot(items[:-1], marker=items[-1]) axes[1].grid(axis="both") axes[1].set_xticks([0, 1, 2], ["360p", "720p", "1080p"]) axes[1].set_ylabel("The encoded size [bytes]") axes[1].set_title("The size of encoded videos") axes[1].legend( [ "Software Encoding", "Hardware Encoding", ] ) plt.tight_layout() plot() .. image-sg:: /tutorials/images/sphx_glr_nvenc_tutorial_001.png :alt: Time to encode videos with different resolutions, The size of encoded videos :srcset: /tutorials/images/sphx_glr_nvenc_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 312-337 Result ------ We observe couple of things; - The time to encode video grows as the resolution becomes larger. - In the case of software encoding, increasing the number of threads helps reduce the decoding time. - The gain from extra threads diminishes around 8. - Hardware encoding is faster than software encoding in general. - Using ``hw_accel`` does not improve the speed of encoding itself as much. - The size of the resulting videos grow as the resolution becomes larger. - Hardware encoder produces smaller video file at larger resolution. The last point is somewhat strange to the author (who is not an expert in production of videos.) It is often said that hardware decoders produce larger video compared to software encoders. Some says that software encoders allow fine-grained control over encoding configuration, so the resulting video is more optimal. Meanwhile, hardware encoders are optimized for performance, thus does not provide as much control over quality and binary size. .. GENERATED FROM PYTHON SOURCE LINES 339-351 Quality Spotcheck ----------------- So, how are the quality of videos produced with hardware encoders? A quick spot check of high resolution videos uncovers that they have more noticeable artifacts on higher resolution. Which might be an explanation of the smaller binary size. (meaning, it is not allocating enough bits to produce quality output.) The following images are raw frames of videos encoded with hardware encoders. .. GENERATED FROM PYTHON SOURCE LINES 353-359 360P ---- .. raw:: html NVENC sample 360P .. GENERATED FROM PYTHON SOURCE LINES 361-367 720P ---- .. raw:: html NVENC sample 720P .. GENERATED FROM PYTHON SOURCE LINES 369-375 1080P ----- .. raw:: html NVENC sample 1080P .. GENERATED FROM PYTHON SOURCE LINES 377-384 We can see that there are more artifacts at higher resolution, which are noticeable. Perhaps one might be able to reduce these using ``encoder_options`` arguments. We did not try, but if you try that and find a better quality setting, feel free to let us know. ;) .. GENERATED FROM PYTHON SOURCE LINES 387-388 Tag: :obj:`torchaudio.io` .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 21.580 seconds) .. _sphx_glr_download_tutorials_nvenc_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: nvenc_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: nvenc_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_