WOLFRAM

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how
Wolfram Language & System Documentation Center

NetEncoder ["AudioSpectrogram"]

represents an encoder that converts an audio file or object into its spectrogram.

NetEncoder [{"AudioSpectrogram","param"->val,}]

represents an encoder with specific parameters for preprocessing and feature computation.

Details

  • The "AudioSpectrogram" encoder computes the spectrogram of a signal and discards some redundant information contained in the short-time Fourier transform. It also discards the phase information, which means that an exact reconstruction of the original signal is not possible.
  • NetEncoder [][input] applies the encoder to an input to produce a "Real32" NumericArray .
  • NetEncoder [][{input1,input2,}] applies the encoder to a list of inputs to produce a list of NumericArray objects.
  • The input to the encoder can be an Audio object or a File [] expression.
  • The output of the encoder is a rank-2 tensor of dimensions {n,Floor [(ws/2.)+1]}, where n is the number of partitions after the preprocessing is applied and ws is the length of the partitions used for the computation.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder [] when constructing the net.
  • Parameters
  • The following general parameters are supported:
  • "Augmentation" None augmentation to be applied
    "Normalization" None whether to apply normalization
    "SampleRate" 16000 target sample rate
    "TargetLength" All target output length
  • Additional partitioning parameters:
  • "WindowSize" Automatic length of the partitions
    "Offset" Automatic offset of the partitions
    "WindowFunction" Automatic window to be applied to the partitions
  • The following settings and suboptions can be specified for each encoder parameter.
  • "Normalization" can take the following settings:
  • None no normalization
    "Max" absolute maximum value normalized to 1
    {"Max",val} absolute maximum value normalized to val
    {"RMS",val} RMS of input audio signal normalized to val
  • "TargetLength" can take the following settings:
  • All same as input signal
    dur the duration dur specified as a time quantity
    n the first n partitions
  • If the specified "TargetLength" does not match the length of the input signal, padding or trimming are applied accordingly.
  • "Augmentation" can be specified as a list of rules with the following keys:
  • "Convolution" None convolves an impulse response to the input
    "Noise" None adds noise to the input
    "TimeShift" None shifts the input by a specified amount
    "Volume" None multiplies the input with a constant
  • Any augmentation parameter that accepts a numeric value can also be specified as a list of two numbers or a univariate distribution. In the first case, the value will be randomized according to a uniform distribution between the given bounds. In the second, the user-provided distribution will be used.
  • Possible values for "Convolution" include:
  • None no augmentation
    signal File or Audio object to be convolved with input
    {mix,signal} signal to be convolved with input and mix parameter
  • Possible values for "Noise" include:
  • None no augmentation
    amp white noise with amplitude amp
    noise File or Audio object containing the noise signal to be added
    {amp,noise}
  • noise signal and its with the specified amplitude
  • Use "TimeShift"->t to shift the input by t seconds, padding or trimming if necessary. Use Scaled [s] to shift the input by s×dur seconds, where dur is the duration of the input signal. Use {t1,t2} or Scaled [{ts1,t2}] to randomize the shift between the specified times.
  • Use "Volume"->val to specify a constant multiplier.
  • With the parameter "WindowSize"->Automatic , a partition length of 25 milliseconds is used. Use "WindowSize"->dur to select a partition length of duration dur. Use "WindowSize"->n to select a partition length of n samples.
  • With the parameter "Offset"->Automatic , a partition offset of 8.33 milliseconds is used. Use "Offset"->dur to select a partition offset of duration dur. Use "Offset"->n to select a partition offset of n samples.
  • Parameter "WindowFunction" applies a window to each partition. Possible settings are:
  • None no windowing applied to the input audio
    func the window is computed using the function func
    list the sampled window list is explicitly specified
  • Examples

    open all close all

    Basic Examples  (1)

    Create a spectrogram NetEncoder :

    Create an Audio object:

    Apply the encoder to the Audio object:

    Plot the result:

    Scope  (3)

    NetEncoder ["AudioSpectrogram"] can encode either File or Audio objects. Create a spectrogram encoder:

    Apply the encoder to a File object:

    Apply the encoder to an in-core Audio object:

    Apply the encoder to an out-of-core Audio object:

    Create a list of Audio objects:

    NetEncoder ["AudioSpectrogram"] maps across a batch of inputs:

    Create a spectrogram NetEncoder :

    Attach the encoder to the input of a net:

    Apply the net to an Audio object:

    Parameters  (6)

    "Normalization"  (1)

    Create an Audio object:

    Use an encoder with "Normalization"->None to avoid any normalization:

    Since the normalization is applied to the signal before the spectrogram is computed, there are no guarantees on the bounds of the result:

    Use an encoder with "Normalization"->Automatic to normalize the maximum absolute value of the waveform samples to 1.:

    Find the minimum and maximum values of the result:

    "SampleRate"  (2)

    Create an Audio object:

    Using an encoder with "SampleRate"8000 resamples the signal to 8000Hz before performing the short-time Fourier transform:

    The "SampleRate" parameter affects the computation of the default window size:

    An encoder with a lower sample rate than the original audio will result in a shorter window length:

    An encoder with a higher sample rate than the original audio will result in a longer window length:

    "TargetLength"  (1)

    Create an Audio object:

    Using an encoder with "TargetLength"All returns the spectrogram for all the data:

    Using an encoder with "TargetLength"10 zero-pads the output to be of length 10:

    Using an encoder with "TargetLength"2 takes only the first two partitions:

    "WindowSize"  (1)

    Create an Audio object:

    The partition length is automatically computed to be 25ms:

    Using an encoder with "WindowSize"600 returns the spectrogram using partitions of 600 samples:

    "Offset"  (1)

    Create an Audio object:

    The partition offset is automatically computed to be 1/3 of the partition length:

    Using an encoder with "Offset"10 returns the short-time Fourier transform computed using partitions with an offset of 10 samples:

    Properties & Relations  (2)

    Create an Audio object:

    Create a spectrogram NetEncoder :

    The length of the result can be computed as Ceiling [length/offset], where length is the length of the signal after resampling and offset is the "Offset" parameter of the encoder:

    Create an Audio object:

    Create a spectrogram NetEncoder :

    The second dimension of the result can be computed as Floor [windowSize/2+1], where windowSize is the "WindowSize" parameter of the encoder:

    See Also

    NetEncoder   Audio   SpectrogramArray   AudioResample   ConformAudio   NetChain   NetGraph   NetTrain

    Net Encoders: Audio   AudioSTFT   AudioMelSpectrogram   AudioMFCC

    Tech Notes

    History

    Introduced in 2018 (11.3) | Updated in 2019 (12.0)

    Top [フレーム]

    AltStyle によって変換されたページ (->オリジナル) /