StreamReader

class torchaudio.io.StreamReader(src: str, format: Optional[str] = None, option: Optional[Dict[str, str]] = None, buffer_size: int = 4096)[source]

Fetch and decode audio/video streams chunk by chunk.

For the detailed usage of this class, please refer to the tutorial.

Parameters:

src (str, file-like object or Tensor) –
The media source. If string-type, it must be a resource indicator that FFmpeg can handle. This includes a file path, URL, device identifier or filter expression. The supported value depends on the FFmpeg found in the system.

If file-like object, it must support read method with the signature read(size: int) -> bytes. Additionally, if the file-like object has seek method, it uses the method when parsing media metadata. This improves the reliability of codec detection. The signagure of seek method must be seek(offset: int, whence: int) -> int.

If Tensor, it is interpreted as byte buffer. It must be one-dimensional, of type torch.uint8.

Please refer to the following for the expected signature and behavior of read and seek method.
- https://docs.python.org/3/library/io.html#io.BufferedIOBase.read
- https://docs.python.org/3/library/io.html#io.IOBase.seek
format (str or None, optional) –
Override the input format, or specify the source sound device. Default: None (no override nor device input).

This argument serves two different usecases.
1. Override the source format. This is useful when the input data do not contain a header.
2. Specify the input source device. This allows to load media stream from hardware devices, such as microphone, camera and screen, or a virtual device.
Note

This option roughly corresponds to -f option of ffmpeg command. Please refer to the ffmpeg documentations for the possible values.

https://ffmpeg.org/ffmpeg-formats.html#Demuxers

Use ffmpeg -demuxers to list the values available in the current environment.

For device access, the available values vary based on hardware (AV device) and software configuration (ffmpeg build).

https://ffmpeg.org/ffmpeg-devices.html#Input-Devices

Use ffmpeg -devices to list the values available in the current environment.
option (dict of str to str, optional) –
Custom option passed when initializing format context (opening source).

You can use this argument to change the input source before it is passed to decoder.

Default: None.
buffer_size (int) –
The internal buffer size in byte. Used only when src is file-like object.

Default: 4096.

Tutorials using StreamReader:: StreamWriter Advanced Usage

StreamWriter Advanced Usage

Online ASR with Emformer RNN-T

Online ASR with Emformer RNN-T

StreamReader Advanced Usages

StreamReader Advanced Usages

StreamReader Basic Usages

StreamReader Basic Usages

Device ASR with Emformer RNN-T

Device ASR with Emformer RNN-T

Properties

default_audio_stream

property StreamReader.default_audio_stream

The index of default audio stream. None if there is no audio stream

Type:: Optional[int]

default_video_stream

property StreamReader.default_video_stream

The index of default video stream. None if there is no video stream

Type:: Optional[int]

num_out_streams

property StreamReader.num_out_streams

Number of output streams configured by client code.

Type:: int

num_src_streams

property StreamReader.num_src_streams

Number of streams found in the provided media source.

Type:: int

Methods

add_audio_stream

StreamReader.add_audio_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, filter_desc: Optional[str] = None)[source]

Add output audio stream

Parameters:

frames_per_chunk (int) – Number of frames returned as one chunk. If the source stream is exhausted before enough frames are buffered, then the chunk is returned as-is.
buffer_chunk_size (int, optional) –
Internal buffer size. When the number of chunks buffered exceeds this number, old frames are dropped.

Default: 3.
stream_index (int or None, optional) – The source audio stream index. If omitted, default_audio_stream is used.
decoder (str or None, optional) –
The name of the decoder to be used. When provided, use the specified decoder instead of the default one.

To list the available decoders, you can use ffmpeg -decoders command.

Default: None.
decoder_option (dict or None, optional) –
Options passed to decoder. Mapping from str to str.

To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

Default: None.
filter_desc (str or None, optional) – Filter description. The list of available filters can be found at https://ffmpeg.org/ffmpeg-filters.html Note that complex filters are not supported.

add_basic_audio_stream

StreamReader.add_basic_audio_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, format: Optional[str] = 'fltp', sample_rate: Optional[int] = None)[source]

Add output audio stream

Parameters:

frames_per_chunk (int) – Number of frames returned as one chunk. If the source stream is exhausted before enough frames are buffered, then the chunk is returned as-is.
buffer_chunk_size (int, optional) –
Internal buffer size. When the number of chunks buffered exceeds this number, old frames are dropped.

Default: 3.
stream_index (int or None, optional) – The source audio stream index. If omitted, default_audio_stream is used.
decoder (str or None, optional) –
The name of the decoder to be used. When provided, use the specified decoder instead of the default one.

To list the available decoders, you can use ffmpeg -decoders command.

Default: None.
decoder_option (dict or None, optional) –
Options passed to decoder. Mapping from str to str.

To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

Default: None.
format (str, optional) –
Output sample format (precision).

If None, the output chunk has dtype corresponding to the precision of the source audio.

Otherwise, the sample is converted and the output dtype is changed as following.
- "u8p": The output is torch.uint8 type.
- "s16p": The output is torch.int16 type.
- "s32p": The output is torch.int32 type.
- "s64p": The output is torch.int64 type.
- "fltp": The output is torch.float32 type.
- "dblp": The output is torch.float64 type.
Default: "fltp".
sample_rate (int or None, optional) – If provided, resample the audio.

add_basic_video_stream

StreamReader.add_basic_video_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, hw_accel: Optional[str] = None, format: Optional[str] = 'rgb24', frame_rate: Optional[int] = None, width: Optional[int] = None, height: Optional[int] = None)[source]

Add output video stream

Parameters:

frames_per_chunk (int) – Number of frames returned as one chunk. If the source stream is exhausted before enough frames are buffered, then the chunk is returned as-is.
buffer_chunk_size (int, optional) –
Internal buffer size. When the number of chunks buffered exceeds this number, old frames are dropped.

Default: 3.
stream_index (int or None, optional) – The source video stream index. If omitted, default_video_stream is used.
decoder (str or None, optional) –
The name of the decoder to be used. When provided, use the specified decoder instead of the default one.

To list the available decoders, you can use ffmpeg -decoders command.

Default: None.
decoder_option (dict or None, optional) –
Options passed to decoder. Mapping from str to str.

To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

Default: None.
hw_accel (str or None, optional) –
Enable hardware acceleration.

When video is decoded on CUDA hardware, for example decoder=”h264_cuvid”, passing CUDA device indicator to hw_accel (i.e. hw_accel=”cuda:0”) will make StreamReader place the resulting frames directly on the specified CUDA device as CUDA tensor.

If None, the frame will be moved to CPU memory. Default: None.
format (str, optional) –
Change the format of image channels. Valid values are,
- "rgb24": 8 bits * 3 channels (R, G, B)
- "bgr24": 8 bits * 3 channels (B, G, R)
- "yuv420p": 8 bits * 3 channels (Y, U, V)
- "gray": 8 bits * 1 channels
Default: "rgb24".
frame_rate (int or None, optional) – If provided, change the frame rate.
width (int or None, optional) – If provided, change the image width. Unit: Pixel.
height (int or None, optional) – If provided, change the image height. Unit: Pixel.

add_video_stream

StreamReader.add_video_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, hw_accel: Optional[str] = None, filter_desc: Optional[str] = None)[source]

Add output video stream

Parameters:

frames_per_chunk (int) – Number of frames returned as one chunk. If the source stream is exhausted before enough frames are buffered, then the chunk is returned as-is.
buffer_chunk_size (int, optional) –
Internal buffer size. When the number of chunks buffered exceeds this number, old frames are dropped.

Default: 3.
stream_index (int or None, optional) – The source video stream index. If omitted, default_video_stream is used.
decoder (str or None, optional) –
The name of the decoder to be used. When provided, use the specified decoder instead of the default one.

To list the available decoders, you can use ffmpeg -decoders command.

Default: None.
decoder_option (dict or None, optional) –
Options passed to decoder. Mapping from str to str.

To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

Default: None.
hw_accel (str or None, optional) –
Enable hardware acceleration.

When video is decoded on CUDA hardware, for example decoder=”h264_cuvid”, passing CUDA device indicator to hw_accel (i.e. hw_accel=”cuda:0”) will make StreamReader place the resulting frames directly on the specified CUDA device as CUDA tensor.

If None, the frame will be moved to CPU memory. Default: None.
filter_desc (str or None, optional) – Filter description. The list of available filters can be found at https://ffmpeg.org/ffmpeg-filters.html Note that complex filters are not supported.

get_metadata

StreamReader.get_metadata() → Dict[str, str][source]

Get the metadata of the source media.

Returns:: dict

get_out_stream_info

StreamReader.get_out_stream_info(i: int) → StreamReaderOutputStream[source]

Get the metadata of output stream

Parameters:: i (int) – Stream index.
Returns:: OutputStream

get_src_stream_info

StreamReader.get_src_stream_info(i: int) → StreamReaderSourceStream[source]

Get the metadata of source stream

Parameters:: i (int) – Stream index.
Returns:: SourceStream

is_buffer_ready

StreamReader.is_buffer_ready() → bool[source]: Returns true if all the output streams have at least one chunk filled.

pop_chunks

StreamReader.pop_chunks() → Tuple[Optional[Tensor]][source]

Pop one chunk from all the output stream buffers.

Returns:: Buffer contents. If a buffer does not contain any frame, then None is returned instead.
Return type:: Tuple[Optional[Tensor]]

process_all_packets

StreamReader.process_all_packets()[source]: Process packets until it reaches EOF.

process_packet

StreamReader.process_packet(timeout: Optional[float] = None, backoff: float = 10.0) → int[source]

Read the source media and process one packet.

If a packet is read successfully, then the data in the packet will be decoded and passed to corresponding output stream processors.

If the packet belongs to a source stream that is not connected to an output stream, then the data are discarded.

When the source reaches EOF, then it triggers all the output stream processors to enter drain mode. All the output stream processors flush the pending frames.

Parameters:

timeout (float or None, optional) –
Timeout in milli seconds.

This argument changes the retry behavior when it failed to process a packet due to the underlying media resource being temporarily unavailable.

When using a media device such as a microphone, there are cases where the underlying buffer is not ready. Calling this function in such case would cause the system to report EAGAIN (resource temporarily unavailable).
- >=0: Keep retrying until the given time passes.
- 0<: Keep retrying forever.
- None : No retrying and raise an exception immediately.
Default: None.

Note

The retry behavior is applicable only when the reason is the unavailable resource. It is not invoked if the reason of failure is other.
backoff (float, optional) –
Time to wait before retrying in milli seconds.

This option is effective only when timeout is effective. (not None)

When timeout is effective, this backoff controls how long the function should wait before retrying. Default: 10.0.

Returns:

0 A packet was processed properly. The caller can keep calling this function to buffer more frames.

1 The streamer reached EOF. All the output stream processors flushed the pending frames. The caller should stop calling this method.

Return type:

int

remove_stream

StreamReader.remove_stream(i: int)[source]

Remove an output stream.

Parameters:: i (int) – Index of the output stream to be removed.

seek

StreamReader.seek(timestamp: float)[source]

Seek the stream to the given timestamp [second]

Parameters:: timestamp (float) – Target time in second.

stream

StreamReader.stream(timeout: Optional[float] = None, backoff: float = 10.0) → Iterator[Tuple[Optional[Tensor], ...]][source]

Return an iterator that generates output tensors

Parameters:

timeout (float or None, optional) – See process_packet(). (Default: None)
backoff (float, optional) – See process_packet(). (Default: 10.0)

Returns:

Iterator that yields a tuple of chunks that correspond to the output streams defined by client code. If an output stream is exhausted, then the chunk Tensor is substituted with None. The iterator stops if all the output streams are exhausted.

Return type:

Iterator[Tuple[Optional[torch.Tensor], …]]

Support Structures

StreamReaderSourceStream

class torchaudio.io.StreamReaderSourceStream[source]

The metadata of a source stream, returned by get_src_stream_info().

This class is used when representing streams of media type other than audio or video.

When source stream is audio or video type, StreamReaderSourceAudioStream and StreamReaderSourceVideoStream, which reports additional media-specific attributes, are used respectively.

media_type: str: The type of the stream. One of "audio", "video", "data", "subtitle", "attachment" and empty string.

Note

Only audio and video streams are supported for output.

Note

Still images, such as PNG and JPEG formats are reported as video.

codec: str: Short name of the codec. Such as "pcm_s16le" and "h264".

codec_long_name: str

Detailed name of the codec.

Such as “PCM signed 16-bit little-endian” and “H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10”.

format: Optional[str]

Media format. Such as "s16" and "yuv420p".

Commonly found audio values are;

"u8", "u8p": Unsigned 8-bit unsigned interger.
"s16", "s16p": 16-bit signed integer.
"s32", "s32p": 32-bit signed integer.
"flt", "fltp": 32-bit floating-point.

Note

p at the end indicates the format is planar. Channels are grouped together instead of interspersed in memory.

bit_rate: Optional[int]: Bit rate of the stream in bits-per-second. This is an estimated values based on the initial few frames of the stream. For container formats and variable bit rate, it can be 0.

num_frames: Optional[int]: The number of frames in the stream

bits_per_sample: Optional[int]: This is the number of valid bits in each output sample. For compressed format, it can be 0.

metadata: Dict[str, str]: Metadata attached to the source stream.

StreamReaderSourceAudioStream

class torchaudio.io.StreamReaderSourceAudioStream[source]

The metadata of an audio source stream, returned by get_src_stream_info().

This class is used when representing audio stream.

In addition to the attributes reported by StreamReaderSourceStream, the following attributes are reported.

sample_rate: float: Sample rate of the audio.

num_channels: int: Number of channels.

StreamReaderSourceVideoStream

class torchaudio.io.StreamReaderSourceVideoStream[source]

The metadata of a video source stream, returned by get_src_stream_info().

This class is used when representing video stream.

In addition to the attributes reported by StreamReaderSourceStream, the following attributes are reported.

width: int: Width of the video frame in pixel.

height: int: Height of the video frame in pixel.

frame_rate: float: Frame rate.

StreamReaderOutputStream

class torchaudio.io.StreamReaderOutputStream[source]

Output stream configured on StreamReader, returned by get_out_stream_info().

source_index: int: Index of the source stream that this output stream is connected.

filter_description: str: Description of filter graph applied to the source stream.

StreamReader

Properties

default_audio_stream

default_video_stream

num_out_streams

num_src_streams

Methods

add_audio_stream

add_basic_audio_stream

add_basic_video_stream

add_video_stream

get_metadata

get_out_stream_info

get_src_stream_info

is_buffer_ready

pop_chunks

process_all_packets

process_packet

remove_stream

seek

stream

Support Structures

StreamReaderSourceStream

StreamReaderSourceAudioStream

StreamReaderSourceVideoStream

StreamReaderOutputStream

Docs

Tutorials

Resources