MelSpectrogram

class paddle.audio.features. MelSpectrogram ( sr: int = 22050, n_fft: int = 2048, hop_length: Optional[int] = 512, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', n_mels: int = 64, f_min: float = 50.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', dtype: str = 'float32' ) [source]

Compute the melspectrogram of given signals, typically audio waveforms. It is computed by multiplying spectrogram with Mel filter bank matrix.

Parameters
  • sr (int, optional) – Sample rate. Defaults to 22050.

  • n_fft (int, optional) – The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) – The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) – The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) – The window function applied to the signal before the Fourier transform. Supported window functions: ‘hamming’, ‘hann’, ‘kaiser’, ‘gaussian’, ‘exponential’, ‘triang’, ‘bohman’, ‘blackman’, ‘cosine’, ‘tukey’, ‘taylor’. Defaults to ‘hann’.

  • power (float, optional) – Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) – Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) – Choose padding pattern when center is True. Defaults to ‘reflect’.

  • n_mels (int, optional) – Number of mel bins. Defaults to 64.

  • f_min (float, optional) – Minimum frequency in Hz. Defaults to 50.0.

  • f_max (Optional[float], optional) – Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) – Use HTK formula in computing fbank matrix. Defaults to False.

  • norm (Union[str, float], optional) – Type of normalization in computing fbank matrix. Slaney-style is used by default. You can specify norm=1.0/2.0 to use customized p-norm normalization. Defaults to ‘slaney’.

  • dtype (str, optional) – Data type of input and window. Defaults to ‘float32’.

Returns

Layer. An instance of MelSpectrogram.

Examples

>>> import paddle
>>> from paddle.audio.features import MelSpectrogram

>>> sample_rate = 16000
>>> wav_duration = 0.5
>>> num_channels = 1
>>> num_frames = sample_rate * wav_duration
>>> wav_data = paddle.linspace(-1.0, 1.0, num_frames) * 0.1
>>> waveform = wav_data.tile([num_channels, 1])

>>> feature_extractor = MelSpectrogram(sr=sample_rate, n_fft=512, window = 'hann', power = 1.0)
>>> feats = feature_extractor(waveform)
forward ( x: paddle.Tensor ) paddle.Tensor

forward

Parameters

x (Tensor) – Tensor of waveforms with shape (N, T)

Returns

Mel spectrograms with shape (N, n_mels, num_frames).

Return type

Tensor