I need to get a log-frequency scaled spectrogram. I'm currently using scipy.signal.stft
function to get a magnitude array. But output frequencies are linearly spaced.
import librosa
import scipy
sample, samplerate = librosa.load('sound.wav', sr=64000)
f, t, Zxx = scipysignal.stft(sample, fs=samplerate, window='hamming', nperseg=512, noverlap=256)
I basically need f to be log-spaced from 1Hz to 32kHz (since my sound has a samplerate of 64kHz).
I can only get the top spectrogram. I need the actual array of values of the bottom spectrogram. I can obtain it through various visualisation function (librosa specshow, matplotlib yscaled etc.) but I can't find a solution to retrieve an actual 2-D array of magnitudes with only frequency logarithmically-spaced.
Any help or clue on what method to use will be greatly appreciated !
-
Of course your question will be answered here, but I also suggest posting any dsp related questions on dsp.stackexchange.comJdip– Jdip2022年08月19日 11:05:12 +00:00Commented Aug 19, 2022 at 11:05
2 Answers 2
I just stumbled across a good soulution for your problem. The nnAudio library is an audio processing toolbox using PyTorch convolutional neural network as its backend. Though it can also be used as a stand alone solution.
for installation just use:
pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation
To transform your audio into a spectrogram with log-spaced frequency bins use:
from nnAudio import features
from scipy.io import wavfile
import torch
sr, song = wavfile.read('./Bach.wav') # Loading your audio
x = song.mean(1) # Converting Stereo to Mono
x = torch.tensor(x).float() # casting the array into a PyTorch Tensor
spec_layer = features.STFT(n_fft=2048, hop_length=512,
window='hann', freq_scale='log', pad_mode='reflect', sr=sr) # Initializing the model
spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram
log_spec =np.array(spec)[0]# cast PyTorch Tensor back to numpy array
db_log_spec = librosa.amplitude_to_db(log_spec) # convert amplitude spec into db representation
Plotting the resulting log-frequency spectrogram with librosa specshow using the y_axis='linear' flag will give you the asked for representation in an actual 2d array :)
plt.figure()
librosa.display.specshow(db_log_spec, y_axis='linear', x_axis='time', sr=sr)
plt.colorbar()
The library also contains an inverse funktion and a ton of additional features: https://kinwaicheuk.github.io/nnAudio/intro.html
Although producing a good looking log-freq spectrogram I am having trouble reverting the STFT back into the time domain. The included iSTFT does not do the trick for me. Maybe someone else can pick it up from here?
-
Thank you a lot for this interesting solution. But I'm still not sure of the amplitude to db thing because it is only converting the magnitude to a log scale, not the actual frequencies themselves ?Pouple– Pouple2022年08月30日 11:00:54 +00:00Commented Aug 30, 2022 at 11:00
-
You are confusing two different dimensions here. If you look at spectrograms, the db/amplitude dimemsion is color coded, and changing its "scale" is quite easy. (see the math here: librosa.org/doc/latest/generated/librosa.amplitude_to_db.html) The other dimension is the frequency dimension. The y-axis of the spectrogram is shifted by the frequency bins. Those dims are not dependent on each other.Douzery– Douzery2022年08月31日 20:16:48 +00:00Commented Aug 31, 2022 at 20:16
Actually, for record I found out taht what I needed was to perform a constant-Q transform, which is exactly a log-based spectrogram. But you choose the starting frequency, which is in my case, very useful. For this I used librosa.cqt
Explore related questions
See similar questions with these tags.