Can I convert spectrograms generated with librosa back to audio?

 ̄綄美尐妖づ 提交于 2020-05-15 06:38:27

问题


I converted some audio files to spectrograms and saved them to files using the following code:

import os
from matplotlib import pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd

audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_clips = os.listdir(audio_fpath)

def generate_spectrogram(x, sr, save_name):
    X = librosa.stft(x)
    Xdb = librosa.amplitude_to_db(abs(X))
    fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)
    ax = fig.add_axes([0, 0, 1, 1], frameon=False)
    ax.axis('off')
    librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
    plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)
    librosa.cache.clear()

for i in audio_clips:
    audio_fpath = "./audios/"
    spectrograms_path = "./spectrograms/"
    audio_length = librosa.get_duration(filename=audio_fpath + i)
    j=60
    while j < audio_length:
        x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
        save_name = spectrograms_path + i + str(j) + ".jpg"
        generate_spectrogram(x, sr, save_name)
        j += 60
        if j >= audio_length:
            j = audio_length
            x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
            save_name = spectrograms_path + i + str(j) + ".jpg"
            generate_spectrogram(x, sr, save_name)

I wanted to keep the most detail and quality from the audios, so that i could turn them back to audio without too much loss (They are 80MB each).

Is it possible to turn them back to audio files? How can I do it?

I tried using librosa.feature.inverse.mel_to_audio, but it didn't work, and I don't think it applies.

I now have 1300 spectrogram files and want to train a Generative Adversarial Network with them, so that I can generate new audios, but I don't want to do it if i wont be able to listen to the results later.


回答1:


Yes, it is possible to recover most of the signal and estimate the phase with e.g. Griffin-Lim Algorithm (GLA). Its "fast" implementation for Python can be found in librosa. Here's how you can use it:

import numpy as np
import librosa

y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
S = np.abs(librosa.stft(y))
y_inv = librosa.griffinlim(S)

And that's how the original and reconstruction look like:

The algorithm by default randomly initialises the phases and then iterates forward and inverse STFT operations to estimate the phases.

Looking at your code, to reconstruct the signal, you'd just need to do:

import numpy as np

X_inv = librosa.griffinlim(np.abs(X))

It's just an example of course. As pointed out by @PaulR, in your case you'd need to load the data from jpeg (which is lossy!) and then apply inverse transform to amplitude_to_db first.

The algorithm, especially the phase estimation, can be further improved thanks to advances in artificial neural networks. Here is one paper that discusses some enhancements.



来源:https://stackoverflow.com/questions/61132574/can-i-convert-spectrograms-generated-with-librosa-back-to-audio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!