torchaudio.prototype.functional.simulate_rir_ism

torchaudio.prototype.functional.simulate_rir_ism(room: Tensor, source: Tensor, mic_array: Tensor, max_order: int, absorption: Union[float, Tensor], output_length: Optional[int] = None, delay_filter_length: int = 81, center_frequency: Optional[Tensor] = None, sound_speed: float = 343.0, sample_rate: float = 16000.0) → Tensor[source]

Compute Room Impulse Response (RIR) based on the image source method [Allen and Berkley, 1979]. The implementation is based on pyroomacoustics [Scheibler et al., 2018].

Parameters:

room (torch.Tensor) – Room coordinates. The shape of room must be (3,) which represents three dimensions of the room.
source (torch.Tensor) – Sound source coordinates. Tensor with dimensions (3,).
mic_array (torch.Tensor) – Microphone coordinates. Tensor with dimensions (channel, 3).
max_order (int) – The maximum number of reflections of the source.
absorption (float or torch.Tensor) – The absorption [Wikipedia contributors, n.d.] coefficients of wall materials for sound energy. If the dtype is float, the absorption coefficient is identical for all walls and all frequencies. If absorption is a 1D Tensor, the shape must be (6,), where the values represent absorption coefficients of "west", "east", "south", "north", "floor", and "ceiling", respectively. If absorption is a 2D Tensor, the shape must be (7, 6), where 7 represents the number of octave bands.
output_length (int or None, optional) –
The output length of simulated RIR signal. If None, the length is defined as

$\frac{\text{max\_d} \cdot \text{sample\_rate}}{\text{sound\_speed}} + \text{delay\_filter\_length}$

where max_d is the maximum distance between image sources and microphones.
delay_filter_length (int, optional) – The filter length for computing sinc function. (Default: 81)
center_frequency (torch.Tensor, optional) – The center frequencies of octave bands for multi-band walls. Only used when absorption is a 2D Tensor.
sound_speed (float, optional) – The speed of sound. (Default: 343.0)
sample_rate (float, optional) – The sample rate of the generated room impulse response signal. (Default: 16000.0)

Returns:

The simulated room impulse response waveform. Tensor with dimensions (channel, rir_length).

Return type:

(torch.Tensor)

Note

If absorption is a 2D Tensor and center_frequency is set to None, the center frequencies of octave bands are fixed to [125.0, 250.0, 500.0, 1000.0, 2000.0, 4000.0, 8000.0]. Users need to tune the values of absorption to the corresponding frequencies.

torchaudio.prototype.functional.simulate_rir_ism

Docs

Tutorials

Resources