• Docs >
  • Accelerated video encoding with NVENC >
  • Old version (stable)
Shortcuts

Accelerated video encoding with NVENC

Author: Moto Hira

This tutorial shows how to use NVIDIA’s hardware video encoder (NVENC) with TorchAudio, and how it improves the performance of video encoding.

Note

This tutorial requires FFmpeg libraries compiled with HW acceleration enabled.

Please refer to Enabling GPU video decoder/encoder for how to build FFmpeg with HW acceleration.

Note

Most modern GPUs have both HW decoder and encoder, but some highend GPUs like A100 and H100 do not have HW encoder. Please refer to the following for the availability and format coverage. https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new

Attempting to use HW encoder on these GPUs fails with an error message like Generic error in an external library. You can enable debug log with torchaudio.utils.ffmpeg_utils.set_log_level() to see more detailed error messages issued along the way.

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import io
import time

import matplotlib.pyplot as plt
from IPython.display import Video
from torchaudio.io import StreamReader, StreamWriter
2.4.0
2.4.0

Check the prerequisites

First, we check that TorchAudio correctly detects FFmpeg libraries that support HW decoder/encoder.

from torchaudio.utils import ffmpeg_utils
print("FFmpeg Library versions:")
for k, ver in ffmpeg_utils.get_versions().items():
    print(f"  {k}:\t{'.'.join(str(v) for v in ver)}")
FFmpeg Library versions:
  libavcodec:   60.3.100
  libavdevice:  60.1.100
  libavfilter:  9.3.100
  libavformat:  60.3.100
  libavutil:    58.2.100
print("Available NVENC Encoders:")
for k in ffmpeg_utils.get_video_encoders().keys():
    if "nvenc" in k:
        print(f" - {k}")
Available NVENC Encoders:
 - av1_nvenc
 - h264_nvenc
 - hevc_nvenc
print("Avaialbe GPU:")
print(torch.cuda.get_device_properties(0))
Avaialbe GPU:
_CudaDeviceProperties(name='NVIDIA A10G', major=8, minor=6, total_memory=22502MB, multi_processor_count=80)

We use the following helper function to generate test frame data. For the detail of synthetic video generation please refer to StreamReader Advanced Usage.

def get_data(height, width, format="yuv444p", frame_rate=30000 / 1001, duration=4):
    src = f"testsrc2=rate={frame_rate}:size={width}x{height}:duration={duration}"
    s = StreamReader(src=src, format="lavfi")
    s.add_basic_video_stream(-1, format=format)
    s.process_all_packets()
    (video,) = s.pop_chunks()
    return video

Encoding videos with NVENC

To use HW video encoder, you need to specify the HW encoder when defining the output video stream by providing encoder option to add_video_stream().

pict_config = {
    "height": 360,
    "width": 640,
    "frame_rate": 30000 / 1001,
    "format": "yuv444p",
}

frame_data = get_data(**pict_config)
w = StreamWriter(io.BytesIO(), format="mp4")
w.add_video_stream(**pict_config, encoder="h264_nvenc", encoder_format="yuv444p")
with w.open():
    w.write_video_chunk(0, frame_data)

Similar to the HW decoder, by default, the encoder expects the frame data to be on CPU memory. To send data from CUDA memory, you need to specify hw_accel option.

buffer = io.BytesIO()
w = StreamWriter(buffer, format="mp4")
w.add_video_stream(**pict_config, encoder="h264_nvenc", encoder_format="yuv444p", hw_accel="cuda:0")
with w.open():
    w.write_video_chunk(0, frame_data.to(torch.device("cuda:0")))
buffer.seek(0)
video_cuda = buffer.read()
Video(video_cuda, embed=True, mimetype="video/mp4")