Log-frequency spectrogram array

Question 1

I need to get a log-frequency scaled spectrogram. I'm currently using scipy.signal.stft function to get a magnitude array. But output frequencies are linearly spaced.

import librosa
import scipy
sample, samplerate = librosa.load('sound.wav', sr=64000)
f, t, Zxx = scipysignal.stft(sample, fs=samplerate, window='hamming', nperseg=512, noverlap=256)

I basically need f to be log-spaced from 1Hz to 32kHz (since my sound has a samplerate of 64kHz).

I can only get the top spectrogram. I need the actual array of values of the bottom spectrogram. I can obtain it through various visualisation function (librosa specshow, matplotlib yscaled etc.) but I can't find a solution to retrieve an actual 2-D array of magnitudes with only frequency logarithmically-spaced.

enter image description here

Any help or clue on what method to use will be greatly appreciated !

Question 2

Of course your question will be answered here, but I also suggest posting any dsp related questions on dsp.stackexchange.com

Question 3

I just stumbled across a good soulution for your problem. The nnAudio library is an audio processing toolbox using PyTorch convolutional neural network as its backend. Though it can also be used as a stand alone solution.

for installation just use:

pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation

To transform your audio into a spectrogram with log-spaced frequency bins use:

from nnAudio import features
from scipy.io import wavfile
import torch
sr, song = wavfile.read('./Bach.wav') # Loading your audio
x = song.mean(1) # Converting Stereo to Mono
x = torch.tensor(x).float() # casting the array into a PyTorch Tensor
spec_layer = features.STFT(n_fft=2048, hop_length=512,
 window='hann', freq_scale='log', pad_mode='reflect', sr=sr) # Initializing the model
spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram
log_spec =np.array(spec)[0]# cast PyTorch Tensor back to numpy array
db_log_spec = librosa.amplitude_to_db(log_spec) # convert amplitude spec into db representation

Plotting the resulting log-frequency spectrogram with librosa specshow using the y_axis='linear' flag will give you the asked for representation in an actual 2d array :)

plt.figure()
librosa.display.specshow(db_log_spec, y_axis='linear', x_axis='time', sr=sr)
plt.colorbar()

The library also contains an inverse funktion and a ton of additional features: https://kinwaicheuk.github.io/nnAudio/intro.html

Although producing a good looking log-freq spectrogram I am having trouble reverting the STFT back into the time domain. The included iSTFT does not do the trick for me. Maybe someone else can pick it up from here?

Question 4

Thank you a lot for this interesting solution. But I'm still not sure of the amplitude to db thing because it is only converting the magnitude to a log scale, not the actual frequencies themselves ?

Question 5

You are confusing two different dimensions here. If you look at spectrograms, the db/amplitude dimemsion is color coded, and changing its "scale" is quite easy. (see the math here: librosa.org/doc/latest/generated/librosa.amplitude_to_db.html) The other dimension is the frequency dimension. The y-axis of the spectrogram is shifted by the frequency bins. Those dims are not dependent on each other.

Question 6

Actually, for record I found out taht what I needed was to perform a constant-Q transform, which is exactly a log-based spectrogram. But you choose the starting frequency, which is in my case, very useful. For this I used librosa.cqt

Douzery DouzeryDouzery 111 bronze badge · Answer 1 · 2022-08-26 13:03:32Z

I just stumbled across a good soulution for your problem. The nnAudio library is an audio processing toolbox using PyTorch convolutional neural network as its backend. Though it can also be used as a stand alone solution.

for installation just use:

pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation

To transform your audio into a spectrogram with log-spaced frequency bins use:

from nnAudio import features
from scipy.io import wavfile
import torch
sr, song = wavfile.read('./Bach.wav') # Loading your audio
x = song.mean(1) # Converting Stereo to Mono
x = torch.tensor(x).float() # casting the array into a PyTorch Tensor
spec_layer = features.STFT(n_fft=2048, hop_length=512,
 window='hann', freq_scale='log', pad_mode='reflect', sr=sr) # Initializing the model
spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram
log_spec =np.array(spec)[0]# cast PyTorch Tensor back to numpy array
db_log_spec = librosa.amplitude_to_db(log_spec) # convert amplitude spec into db representation

Plotting the resulting log-frequency spectrogram with librosa specshow using the y_axis='linear' flag will give you the asked for representation in an actual 2d array :)

plt.figure()
librosa.display.specshow(db_log_spec, y_axis='linear', x_axis='time', sr=sr)
plt.colorbar()

The library also contains an inverse funktion and a ton of additional features: https://kinwaicheuk.github.io/nnAudio/intro.html

Although producing a good looking log-freq spectrogram I am having trouble reverting the STFT back into the time domain. The included iSTFT does not do the trick for me. Maybe someone else can pick it up from here?

Thank you a lot for this interesting solution. But I'm still not sure of the amplitude to db thing because it is only converting the magnitude to a log scale, not the actual frequencies themselves ?
You are confusing two different dimensions here. If you look at spectrograms, the db/amplitude dimemsion is color coded, and changing its "scale" is quite easy. (see the math here: librosa.org/doc/latest/generated/librosa.amplitude_to_db.html) The other dimension is the frequency dimension. The y-axis of the spectrogram is shifted by the frequency bins. Those dims are not dependent on each other.

Pouple PouplePouple 968 bronze badges · Answer 2 · 2022-08-30 11:00:00Z

Actually, for record I found out taht what I needed was to perform a constant-Q transform, which is exactly a log-based spectrogram. But you choose the starting frequency, which is in my case, very useful. For this I used librosa.cqt

CollectivesTM on Stack Overflow

Log-frequency spectrogram array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related