class paddle.audio.features. LogMelSpectrogram ( sr: int = 22050, n_fft: int = 512, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', n_mels: int = 64, f_min: float = 50.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None, dtype: str = 'float32' ) [source]

Compute log-mel-spectrogram feature of given signals, typically audio waveforms.

  • sr (int, optional) – Sample rate. Defaults to 22050.

  • n_fft (int, optional) – The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) – The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) – The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) – The window function applied to the signal before the Fourier transform. Supported window functions: ‘hamming’, ‘hann’, ‘kaiser’, ‘gaussian’, ‘exponential’, ‘triang’, ‘bohman’, ‘blackman’, ‘cosine’, ‘tukey’, ‘taylor’. Defaults to ‘hann’.

  • power (float, optional) – Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) – Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) – Choose padding pattern when center is True. Defaults to ‘reflect’.

  • n_mels (int, optional) – Number of mel bins. Defaults to 64.

  • f_min (float, optional) – Minimum frequency in Hz. Defaults to 50.0.

  • f_max (Optional[float], optional) – Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) – Use HTK formula in computing fbank matrix. Defaults to False.

  • norm (Union[str, float], optional) – Type of normalization in computing fbank matrix. Slaney-style is used by default. You can specify norm=1.0/2.0 to use customized p-norm normalization. Defaults to ‘slaney’.

  • ref_value (float, optional) – The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) – The minimum value of input magnitude. Defaults to 1e-10.

  • top_db (Optional[float], optional) – The maximum db value of spectrogram. Defaults to None.

  • dtype (str, optional) – Data type of input and window. Defaults to ‘float32’.


Layer. An instance of LogMelSpectrogram.


>>> import paddle
>>> from paddle.audio.features import LogMelSpectrogram

>>> sample_rate = 16000
>>> wav_duration = 0.5
>>> num_channels = 1
>>> num_frames = sample_rate * wav_duration
>>> wav_data = paddle.linspace(-1.0, 1.0, num_frames) * 0.1
>>> waveform = wav_data.tile([num_channels, 1])

>>> feature_extractor = LogMelSpectrogram(sr=sample_rate, n_fft=512, window = 'hann', power = 1.0)
>>> feats = feature_extractor(waveform)
forward ( x: paddle.Tensor ) paddle.Tensor



x (Tensor) – Tensor of waveforms with shape (N, T)


Log mel spectrograms with shape (N, n_mels, num_frames).

Return type