How can I improve the look of scipy's spectrograms?

问题

I need to generate spectrograms for audio files with Python and I'm following the solution given here. However, the spectrograms I'm getting don't look very "populated," and not at all like other spectrograms I get from other software.

This is the code I used for the particular image I'm showing here:

import matplotlib.pyplot as plt
from matplotlib import cm
from scipy import signal
from scipy.io import wavfile

sample_rate, samples = wavfile.read('audio-mono.wav')
frequencies, times, spectrogram = signal.spectrogram(samples[:700000], sample_rate)

cMap = cm.get_cmap('gray', 3000) # Maybe I'm not understanding this very well
fig = plt.figure(figsize=(4,2), dpi=400, frameon=False)

plt.pcolormesh(times, frequencies, spectrogram, cmap=cMap)
plt.savefig('spectrogram.png')

The following images are spectrograms from Audacity and Aegisub, respectively, both for the same file for which the third image's spectrogram was created (with scipy).

To create this spectrogram, trying to see if it was a figure-size/resolution issue, I tried a couple of things things, one by one, and the end result is this image (with both of them applied). First, when extracting the .wav file from the .mp4 file, I set the sampling rate to 10 KHz to avoid having such a big y-axis in the plot and see if this helps. This is why you see a max of 5,000. I though I could live with some frequencies neglected given that I care, most of all, about speech frequencies.

Then, to get a better zoom, I created a spectrogram with only the first 700,000 elements of the samples array (see code), which, in the case of this file, represent about 70 seconds. This didn't help either. I even tried to create the spectrogram with the same slice of the samples array, but by taking only every tenth value, then every twentieth, and so on, but this only made the spectrogram have horizontal lines instead of dots. This is not applied here in the figure I'm showing you, because I realized it's far from helping. I also tinkered with the figure size and the resolution, but it didn't really help either.

As you can see in the first figure, the y-axis goes from 0 to 5 KHz, and many frequencies have some intensity at that level. Also, the only moment in that 70-second span with complete silence is around the 35-second mark. The accuracy of this becomes obvious when listening to the file.

In the second figure, there is no y-axis mark, but I can see it has a bigger range than the 5 KHz, which I think accounts for the difference with the first figure. I'm pretty sure that, unfortunately, I can't change this view range. However, this spectrogram also shows the moment of complete silence accurately, and it is at least properly "populated" in the rest of it.

By seeing the third figure (the one I generated with scipy), one could easily think there are several parts of complete silence in those first 70 seconds, which is far from true. I'd like it to look more like those above it, because I know they're much more accurate, but I don't really know how I can do it, and this one won't work at all.

I'm pretty sure there is something I can do, but I think I still don't know scipy enough to know what it is.

Thanks in advance.

EDIT 1

PLOTTED THE SPECTROGRAM WITHOUT SPECIFYING A COLORMAP

You can see the plot looks a bit more populated, but still not even close to the other ones.

EDIT 2

Considering the idea given in the first comment of this question, I used a manipulated version of the gray colormap to have black as the first entry (as normal) but with the second entry being the color that's normally halfway, and then 2,999 colors from there up to white. Please excuse me if I'm using wrong terminology here or if this is not correctly phrased. I'm still trying to understand how to work with colormaps.

The code used to create and plot the spectrogram is the same. The only difference is the colormap used, which I manipulated as follows:

import numpy as np
from matplotlib.colors import ListedColormap

cMap = cm.get_cmap('gray', 3000)

new_colors = cMap(np.linspace(0.5, 1, 3000))
black = [0, 0, 0, 1]
new_colors[0, :] = black
new_cmp = ListedColormap(new_colors)

Using new_cmp as the colormap for the pcolormesh() function, I get the following spectrogram.

This is much, much better than the original, and looks much more like the ones from Audacity and Aegisub. However, I'd like to know if there is a better approach I can take to make my spectrograms look better, if there could be something else that's causing this to not look so much as the sample ones, and if there is a better way to do what I did with the colormap. As I said, I'm still struggling with them.

EDIT 3

I'm now sharing the audio I used to create these spectrograms here.

来源：https://stackoverflow.com/questions/62909585/how-can-i-improve-the-look-of-scipys-spectrograms

标签

python

matplotlib

audio

scipy

spectrogram