python实现歌曲相似度比较
2019/9/20
最近学信号与系统,想着弄个小项目来提高学习兴趣。特此记录一下。
这是大概想到的准备工作,一边推进,一边学吧!!!
2019/9/21
频域信号处理
FFT变换所得的复数的含义:
- 下标为0的实数表示时域信号的直流部分
- 下标为i的复数为a+bj表示时域信号中周期为N/i个取样值的正弦波和余弦波的成分,其中a表示余弦波形的成分,b表示正弦波形的成分
- 复数的模的两倍为对应频率的余弦波的振幅
- 复数的辐角表示对应频率的余弦波的相位
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
x = np.arange(0, 2*np.pi, 2*np.pi/128)
y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
yf = fft(y)/len(y)
print(np.array_str(yf[:5], suppress_small=True))
for ii in range(0, 5):
print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
运行上述程序可以观察得到以上结论
合成时域信号
需要着重解释的是多个余弦信号合成任意时域信号的过程:
FFT转换得到的N个复数组成的数组A,表示第个子信号,其中的子信号表示直流信号,且表示直流信号的振幅。
利用前个自信号合成过程用数学表达式表示:
代码如下所示
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
mpl.rcParams['font.sans-serif'] = ['SimHei'] #显示中文
mpl.rcParams['axes.unicode_minus'] = False #显示负号
# x = np.arange(0, 2*np.pi, 2*np.pi/128)
# y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
# yf = fft(y)/len(y)
# print(np.array_str(yf[:5], suppress_small=True))
# for ii in range(0, 5):
# print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
def triangle_wave(size):
x = np.arange(0, 1, 1.0/size)
y = np.where(x < 0.5, x, 0)
y = np.where(x >= 0.5, 1-x, y)
return x, y
###
def fft_comnbine(bins, n, loops):
length = len(bins)*loops
data=np.zeros(length)
index=loops*np.arange(0, 2*np.pi, (2*np.pi)/length)
for k, p in enumerate(bins[:n]):
if k != 0:
p *= 2
###合成时域信号的过程
data += np.real(p)*np.cos(k*index)
data -= np.imag(p)*np.sin(k*index)
return index, data
fft_size = 256
###对三角波进行FFT
x, y = triangle_wave(fft_size)
fy = fft(y)/fft_size
loops = 4
y = np.tile(y, (1, loops))
print(y.shape)
y.shape = (fft_size*loops, )#画图python的特殊癖好
###
fig, axes = plt.subplots(2, 1, figsize=(8, 6))
eps = 1e-5
# axes[0].plot(np.clip(20*np.log10(np.abs(fy[:20])+eps), -120, 120), "o")
axes[0].plot(np.abs(fy[:20]), "o")
axes[0].set_xlabel(u"频率窗口(frequency bin)")
axes[0].set_ylabel(u"幅值(dB)")
axes[1].plot(y, label=u"原始三角波", linewidth = 2)
for ii in [0, 1, 3, 5, 7, 9]:
index, data = fft_comnbine(fy, ii+1, loops)
axes[1].plot(data, label="N=%s" % ii, alpha=0.6)
print(index[:20])
axes[1].legend(loc="best")
plt.show()
理论部分后面学了信号与系统在深究吧
哈——现在的我已经学完信号与系统了,回复几个一开始学习遇到的问题。
-
复数的模的两倍为什么对应频率的余弦波的振幅?
ans:若为实信号,那么
由傅里叶变换:
得到推导过程中运用了欧拉公式使得余弦波的振幅乘上1/2,而相位不变。 -
为什么周期为N的离散信号,它的傅里叶变换的周期也是N?
ans:这个就是离散信号的傅里叶变换的周期性。可以在奥本海默的信号与系统的傅里叶性质表看到。
顺便提及奈奎斯特频率:
在采样定理中,采样频率必须大于,这个j就称作奈奎斯特频率,目的是防止采样信号的频率防止重叠,其中为原始信号的频域的最大值。
利用pydub和ffmpeg处理音频
写在前面:
RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work 解决办法——ffmpeg的bin 目录添加到path变量里,注意是path变量而不仅仅是简单的加到系统变量中!!!然后重启。
一、将mp3转换为wav格式,并将歌曲划分为几个部分
说在前面:
1.将歌曲划分为几部分主要是为了将特征的时间顺序体现出来
2. wav:非压缩文件格式。
3.mp3:压缩文件格式。
代码如下:
tail, track = os.path.split(mp3_path)
song_name = track.split('.')
wav_path = os.path.join(tail, 'w_session', song_name[0]+'.wav')
sound = AudioSegment.from_file(mp3_path, format='mp3')
sound.export(wav_path, format='wav')
获取wav文件信息:
w = wave.open(wav_path)
params = w.getparams()
print(params)
#声道数、量化位数(byte)、采样频率、采样点数
nchannels, sampwidth, framerate, nframes = params[:4]
t = np.arange(0, nframes)*(1/framerate)#文件时间
strData = w.readframes(nframes)#读取音频,字符串格式
waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
#waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
waveData = np.reshape(waveData,[nchannels, nframes])#双通道数
划分歌曲:
for ii in range(nchannels):
for jj in range(0, 4):
end_time = start_time + chunk[jj]
blockData = waveData[0, start_time*framerate:end_time*framerate]
start_time = end_time
二、音频特征提取
按照处理空间区分
- 要提取的特征 详情请点击:
时域特征:
线性预测系数、过零率
频域特征:
Mel系数、LPC倒频谱系数、熵特征、光谱质心
时频特征:
小波系数 - TOOLS:pyAudioAnalysis
下载以及安装方法:安装方法
个人感觉这个工具包满新的,github上有各种issues。issues详见
同时有一篇论文有对这个工具包有详细的描述:论文
下面摘抄一部分:
Feature Extraction
Audio Features
- the audio signal is first divided into short-term windows (frames) and for each frame all 34 features are calculated. This results in a sequence of short-term feature vectors of 34 elements each. Widely accepted short-term window sizes are 20 to 100 ms.
- Typical values of the mid-term segment size can be 1 to 10 seconds.
- In cases of long recordings (e.g. music tracks) a long-term averaging of the mid-term features can be applied so that the whole signal is represented by an average vector of mid-term statistics.
- Extract mid-term features and long-term averages in order to produce one feature vector per audio signal.
三、计算相似矩阵
论文中提到:A similarity matrix is computed based on the cosine distances of the individual feature vectors.
但是在实际操作的过程中发现不同特征的量纲不同,导致用余弦相似度来计算特征相似度不准确。例:
7 | 8 | 9 |
---|---|---|
2.62281350727428e-10 | 0 | -50.5964425626941 |
2.29494356256208e-11 | 0 | -50.5964425626941 |
4.55467645371887e-11 | 0 | -50.5964425626941 |
所以我决定计算不同特征的相对比值,然后取平均值。
def similarity(v1, v2):
# 计算平均相似度
temp = []
sim = []
p = 0
q = 1
for ii in range(v1.shape[0]):
for jj in range(v1.shape[1]):
if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
temp.append((1 -
abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii,
jj]),abs(v2[ii, jj]))))
q += 1
sim.append(np.mean(temp[p:q]))
p = q
print(sim)
return sim
此外可以尝试马氏距离——参考文章:
下文为部分摘抄。
-
使用场景:
1、度量两个服从同一分布并且其协方差矩阵为C的随机变量X与Y的差异程度
2、度量X与某一类的均值向量的差异程度,判别样本的归属。此时,Y为类均值向量. -
马氏距离的优缺点:
优点:量纲无关,排除变量之间的相关性的干扰
缺点:不同的特征不能差别对待,可能夸大弱特征
四、减少运行代码的时间
之前将歌曲划分为等差序列的长度demo,可计算一个片段的特征就要好久。我等不下去,所以决定想法子降低复杂度。我想到两个办法:
- 在原来等差序列的片段上随机选取四秒片段,计算特征相似度。如果大于0.5,那么在计算完整片段的特征相似度。
- 将原来采样频率44.1kHz缩小四倍
k = 4
ii = 0
w_decrease = [[], []]
# 降低音频分辨率
for kk in (0, 1):
while ii < len(w[:, kk]):
if ii + k < len(w[:, kk]):
w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
else:
w_decrease[kk].append(np.mean(
w[ii:len(w[:, kk])+1, kk]))
ii = ii + k
w = w_decrease
五、完整代码
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 10 21:51:38 2020
@author: yoona
"""
import os
import sys
import wave
import numpy as np
#import struct
from pydub import AudioSegment
import matplotlib.pyplot as plt
from pyAudioAnalysis import audioFeatureExtraction as afe
import eyed3
import random
import math
def Features(path, mode):
x = wave.open(path)
params = x.getparams()
print(params)
if params[0] != 2:
raise ValueError('通道数不等于2')
strData = x.readframes(params[3])
w = np.frombuffer(strData, dtype=np.int16)
w = np.reshape(w,[params[3], params[0]])
k = 4
ii = 0
w_decrease = [[], []]
if mode == 'second':
# 降低音频分辨率
for kk in (0, 1):
while ii < len(w[:, kk]):
if ii + k < len(w[:, kk]):
w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
else:
w_decrease[kk].append(np.mean(
w[ii:len(w[:, kk])+1, kk]))
ii = ii + k
w = w_decrease
eigen_vector_0 = afe.mtFeatureExtraction(
w[:, 0], params[2],30.0, 30.0, 2, 2)
eigen_vector_1 = afe.mtFeatureExtraction(
w[:, 1], params[2],30.0, 30.0, 2, 2)
return eigen_vector_0, eigen_vector_1
def read_wave(wav_path):
w = wave.open(wav_path)
params = w.getparams()
# print(params)
# 声道数、量化位数(byte)、采样频率、采样点数
nchannels, sampwidth, framerate, nframes = params[:4]
# 文件时间
t = np.arange(0, nframes)*(1/framerate)
strData = w.readframes(nframes)#读取音频,字符串格式
waveData = np.frombuffer(strData, dtype=np.int16)#将字符串转化为int
waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
waveData = np.reshape(waveData,[nframes, nchannels])#双通道数
# plot the wave
plt.figure()
plt.subplot(4,1,1)
plt.plot(t,waveData[:, 0])
plt.xlabel("Time(s)")
plt.ylabel("Amplitude")
plt.title("Ch-1 wavedata")
plt.grid('on')#标尺,on:有,off:无
plt.subplot(4,1,3)
plt.plot(t,waveData[:, 1])
plt.xlabel("Time(s)")
plt.ylabel("Amplitude")
plt.title("Ch-2 wavedata")
plt.grid('on')#标尺,on:有,off:无
plt.show()
def similarity(v1, v2):
# 计算平均相似度
temp = []
sim = []
p = 0
q = 1
for ii in range(v1.shape[0]):
for jj in range(v1.shape[1]):
if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
temp.append((1 -
abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii, jj]),abs(v2[ii, jj]))))
q += 1
sim.append(np.mean(temp[p:q]))
p = q
print(sim)
return sim
def compute_chunk_features(mp3_path):
# =============================================================================
# 计算相似度第一步
# =============================================================================
# 获取歌曲时长
mp3Info = eyed3.load(mp3_path)
time = int(mp3Info.info.time_secs)
print(time)
tail, track = os.path.split(mp3_path)
# 创建两个文件夹
dirct_1 = tail + r'\wavSession'
dirct_2 = tail + r'\wavBlock'
if not os.path.exists(dirct_1):
os.makedirs(dirct_1)
if not os.path.exists(dirct_2):
os.makedirs(dirct_2)
# 获取歌曲名字
song_name = track.split('.')
# 转换格式
wav_all_path = os.path.join(tail, song_name[0]+'.wav')
sound = AudioSegment.from_file(mp3_path, format='mp3')
sound.export(wav_all_path, format='wav')
read_wave(wav_all_path)
# 划分音频
gap = 4
diff = time/10 - 8
start_time = 0
end_time = math.floor(diff)
vector_0 = np.zeros((10, 68))
vector_1 = np.zeros((10, 68))
info = []#记录片段开始时间点
for jj in range(5):
wav_name = song_name[0]+str(jj)+'.wav'
wav_path = os.path.join(tail, 'wavSession', wav_name)
# 随机产生四秒片段
rand_start = random.randint(start_time, end_time)
blockData = sound[rand_start*1000:(rand_start+gap)*1000]
## 音频切片,时间的单位是毫秒
# blockData = sound[start_time*1000:end_time*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, [])
print(jj)# 标记程序运行进程
# 得到一个片段的特征向量
vector_0[jj, :] = np.mean(eigVector_0[0], 1)
vector_1[jj, :] = np.mean(eigVector_1[0], 1)
# 迭代
diff = diff + 4
info.append((start_time, end_time))
start_time = end_time
end_time = math.floor(start_time + diff)
# 承上启下
end_time = start_time
for kk in range(5, 10):
# 迭代
diff = diff - 4
info.append((start_time, start_time + diff))
start_time = end_time
end_time = math.floor(start_time + diff)
wav_name = song_name[0]+str(kk)+'.wav'
wav_path = os.path.join(tail, 'wavSession', wav_name)
# 随机产生四秒片段
rand_start = random.randint(start_time, end_time)
blockData = sound[rand_start*1000:(rand_start+gap)*1000]
# blockData = sound[start_time*1000:end_time*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, [])
print(kk)#标记程序运行进程
# 得到一个片段的特征向量
vector_0[kk, :] = np.mean(eigVector_0[0], 1)
vector_1[kk, :] = np.mean(eigVector_1[0], 1)
return vector_0, vector_1, info# 双通道各自的特征向量
def Compute_Bolck_Features(info, mp3_path):
# =============================================================================
# 计算相似度第二步
# =============================================================================
# 获取歌曲时长
mp3Info = eyed3.load(mp3_path)
time = int(mp3Info.info.time_secs)
print(time)
# 获取歌曲名字
tail, track = os.path.split(mp3_path)
song_name = track.split('.')
# 转换格式
sound = AudioSegment.from_file(mp3_path, format='mp3')
vector_0 = np.zeros((len(info), 68))
vector_1 = np.zeros((len(info), 68))
for kk in range(len(info)):
# 获取歌曲完整片段的特征
wav_name = song_name[0]+str(kk)+'.wav'
wav_path = os.path.join(tail, 'wavBlock', wav_name)
# 截取完整片段
blockData = sound[info[kk][0]*1000:info[kk][1]*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, 'second')
print(kk)#标记程序运行进程
# 得到一个片段的特征向量
vector_0[kk, :] = np.mean(eigVector_0[0], 1)
vector_1[kk, :] = np.mean(eigVector_1[0], 1)
return vector_0, vector_1
def file_exists(file_path):
if os.path.splitext(file_path) == '.mp3':
if os.path.isfile(file_path):
return file_path
else:
raise TypeError('文件不存在')
else:
raise TypeError('文件格式错误,后缀不为.mp3')
if __name__ == '__main__':
#for path, dirs, files in os.walk('C:/Users/yoona/Desktop/music_test/'):
# for f in files:
# if not f.endwith('.mp3'):
# continue
# 把路径组装到一起
#path = r'C:\Users\yoona\Desktop\musictest'
#f = 'CARTA - Aranya (Jungle Festival Anthem).mp3'
#mp3_path = os.path.join(path, f)
# =============================================================================
# sa_b:a表示歌曲的序号,b表示歌曲的通道序号
# =============================================================================
#s1_0, s1_1, info1= compute_chunk_features(mp3_path)
# path_1 = file_exists(sys.argv[1])
# path_2 = file_exists(sys.argv[2])]
path_1 = r'C:\Users\yoona\Desktop\music\薛之谦 - 别.mp3'
path_2 = r'C:\Users\yoona\Desktop\music\薛之谦 - 最好.mp3'
s1_1, s1_2, info1 = compute_chunk_features(path_1)
s2_1, s2_2, info2 = compute_chunk_features(path_2)
sim_1 = similarity(s1_1, s2_1)#通道数1
sim_2 = similarity(s1_2, s2_2)#通道数2
info1_new = []
info2_new = []
for i, element in enumerate(sim_1):
if element >= 0.5:
info1_new.append(info1[i])
if not info1_new:
pos = np.argmax(sim_1)
info1_new.append(info1[pos])
s1_1, s1_2 = Compute_Bolck_Features(info1_new, path_1)
for i, element in enumerate(sim_2):
if element >= 0.5:
info2_new.append(info2[i])
if not info2_new:
pos = np.argmax(sim_2)
info2_new.append(info2[pos])
s2_1, s2_2 = Compute_Bolck_Features(info2_new, path_2)
sim_1 = similarity(s1_1, s2_1)#通道数1
sim_2 = similarity(s1_2, s2_2)#通道数2
六、结果分析
第一组实验对象:
A:薛之谦 - 最好.mp3
B:薛之谦 - 别.mp3
第二组实验对象:
A:Karim Mika - Superficial Love.mp3
B: Burgess/JESSIA - Eclipse.mp3
第三组实验对象:
A: CARTA - Aranya (Jungle Festival Anthem).mp3
B: 薛之谦 - 别.mp3
来源:CSDN
作者:weixin_43012036
链接:https://blog.csdn.net/weixin_43012036/article/details/101040889