7

I converted some audio files to spectrograms and saved them to files using the following code:

import os
from matplotlib import pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd
audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_clips = os.listdir(audio_fpath)
def generate_spectrogram(x, sr, save_name):
 X = librosa.stft(x)
 Xdb = librosa.amplitude_to_db(abs(X))
 fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)
 ax = fig.add_axes([0, 0, 1, 1], frameon=False)
 ax.axis('off')
 librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
 plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)
 librosa.cache.clear()
for i in audio_clips:
 audio_fpath = "./audios/"
 spectrograms_path = "./spectrograms/"
 audio_length = librosa.get_duration(filename=audio_fpath + i)
 j=60
 while j < audio_length:
 x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
 save_name = spectrograms_path + i + str(j) + ".jpg"
 generate_spectrogram(x, sr, save_name)
 j += 60
 if j >= audio_length:
 j = audio_length
 x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
 save_name = spectrograms_path + i + str(j) + ".jpg"
 generate_spectrogram(x, sr, save_name)

I wanted to keep the most detail and quality from the audios, so that i could turn them back to audio without too much loss (They are 80MB each).

Is it possible to turn them back to audio files? How can I do it?

Example spectrograms

I tried using librosa.feature.inverse.mel_to_audio, but it didn't work, and I don't think it applies.

I now have 1300 spectrogram files and want to train a Generative Adversarial Network with them, so that I can generate new audios, but I don't want to do it if i wont be able to listen to the results later.

Lukasz Tracewski
11.5k5 gold badges40 silver badges62 bronze badges
asked Apr 10, 2020 at 1:04
5
  • Not really - you’ve thrown away a lot of information (all of the phase, and some of the magnitude). Commented Apr 10, 2020 at 6:33
  • 2
    @PaulR STFT typically contains a lot of redundant information that can be used to estimate the phase. It's hardly perfect, but if you combine Griffin-Lim Algorithm with e.g. advances in generative deep neural networks, it can get pretty good. Commented Apr 10, 2020 at 7:05
  • 2
    @LukaszTracewski: very interesting - OP is only saving the log magnitude spectrum though (not sure if this is quantized ?) - do you think this will still work ? Commented Apr 10, 2020 at 8:29
  • 2
    @PaulR It's a valid point that full inverse transformation is not possible (due to thresholding applied in amplitude_to_db and the saving to lossy format (jpeg). That being said, unless OP is dealing with some extreme cases, it should not be a big issue. The OP wants to "train a Generative Adversarial Network with them, so that I can generate new audios" and that's not an exact math anyway. Combine that with e.g. tensorflow/magenta and OP is off to a good start. Commented Apr 10, 2020 at 11:58
  • Thanks - very interesting. Commented Apr 10, 2020 at 20:24

2 Answers 2

13

Yes, it is possible to recover most of the signal and estimate the phase with e.g. Griffin-Lim Algorithm (GLA). Its "fast" implementation for Python can be found in librosa. Here's how you can use it:

import numpy as np
import librosa
y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
S = np.abs(librosa.stft(y))
y_inv = librosa.griffinlim(S)

And that's how the original and reconstruction look like:

reconstruction

The algorithm by default randomly initialises the phases and then iterates forward and inverse STFT operations to estimate the phases.

Looking at your code, to reconstruct the signal, you'd just need to do:

import numpy as np
X_inv = librosa.griffinlim(np.abs(X))

It's just an example of course. As pointed out by @PaulR, in your case you'd need to load the data from jpeg (which is lossy!) and then apply inverse transform to amplitude_to_db first.

The algorithm, especially the phase estimation, can be further improved thanks to advances in artificial neural networks. Here is one paper that discusses some enhancements.

answered Apr 10, 2020 at 7:01
Sign up to request clarification or add additional context in comments.

7 Comments

@RamonGriffo Good luck! Setting quality to 100 does not typically give you lossless compression, see e.g. this answer for details stackoverflow.com/questions/7982409/… If you can afford the space, use lossless format. I often go for HDF5, optionally with high compression. If that answers your question, please accept the answer - thanks!
Did you find out how to load/transform the jpg image as a spectrogram?, I don't think this answer answers exactly that part.
@materialvision That's because there's no unambiguous way how to do that. How can you tell how the colour scale from image translates into amplitude? With grayscale images at very least you know relative differences, so recovering a signal is not a big issue.
@LukaszTracewski Thanks, and a hint on how to do it with greyscale image would also be great. I can do the griffinlim on a mel object but not directly on an image of a mel... so I am looking for a way to reverse the process. First generate images of spectrograms, train the model (with different existing image based GANSs) and generate resulting images and then transforming the new images back into sound. That last part is the problem.
Does this work for image of a spectrogram ? If I have an image, pass the image as the input and get the audio from it. Can you please share the code snippet for the same.
|
0

I did this ex-novo in 2016 to recover audio from spectrograms for which no audio was available. I didn't know about the GLA (thanks!) but the algorithm sounds similar, complete with random phases.

As regards importing the spectrograms, for mine you indicate the corners of the graphic and its pixels-per-second and frequency range, and the start and end points of the scale and its range, and a script does the color-to-dB mapping of the graph.

Code: https://gitlab.com/martinwguy/delia-derbyshire/-/tree/master/anal Examples of its output: https://wikidelia.net/wiki/Spectrograms#Inverse_spectrograms

answered Jun 24, 2024 at 8:47

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.