问题
I am trying to clusterize paterns in time series as I ask in
How to clustering syllable types with python?
I try using to solve my problem using the recurrence plots technique, so I make some code in python to reproduce these plots. I want o know if my code is ok, I tried it with a sound temporal series and I am getting this kind of result depending on the distance parameter value:
http://ceciliajarne.web.unq.edu.ar/envelope-problem/
Also I include the data set. I am using ch2. This is my code:
import numpy as np
import scipy
import os
from scipy.io import wavfile
import wave, struct
import matplotlib.pyplot as pp
from pylab import *
import scipy.signal.signaltools as sigtool
import scipy, pylab
from scipy.io import wavfile
import wave, struct
import scipy.signal as signal
from scipy.fftpack import fft
#Data set input
data=np.random.rand(44000*3)
#random secuence to compare with almost 3 seconds of data, cold be other
print 'data:', data
#set size
sissse=data.size
print 'size: ',sissse
print '---------------'
#empty vectors
x_filt_all_p=[]
y_filt_all_p=[]
los_p_filt_all_p=[]
#creating the list to fill
dif=[]
dif_abs=[]
p=1
#for each i-element of data vector for each p
for p in range(1,sissse,4400):
for i in enumerate(data):
#print i
j=i[0]
#print 'j: ',j
if (j<sissse-p):
dif_aux=data[j+p]-data[j]
#print 'dif=',dif_aux
dif.append(dif_aux)
dif_abs.append(abs(data[j+p]-data[j]))
#print'.........'
print'.........'
#print 'dif=',dif
print'.........'
#print 'Absolute difference=',dif_abs
print'.........'
#vector with index and diferences in absolute value
pepe= np.vstack([np.arange(len(dif_abs)),dif_abs])
print 'pepe0: ', pepe[0]
xx=pepe[0]
print 'pepe1: ', pepe[1]
yy=pepe[1]
#filtering the elements with diference<delta
delta= 0.001
# Now let's extract only the part of the data we're interested in...
los_p = np.empty(len(pepe[1]))#dif_abs
los_p.fill(p)
x_filt = xx[yy<delta]
y_filt = yy[yy<delta]
los_p_filt= los_p[yy<delta]
print 'value of coordinate i', x_filt
print 'absolute difference', y_filt
print 'value of coordinate p', los_p_filt
print '------------------------'
if (p==1):
x_filt_all_p=x_filt
y_filt_all_p=y_filt
los_p_filt_all_p=los_p_filt
else:
x_filt_all_p=np.concatenate((x_filt_all_p,x_filt))
y_filt_all_p=np.concatenate((y_filt_all_p,y_filt))
los_p_filt_all_p=np.concatenate((los_p_filt_all_p,los_p_filt))
print 'full value of coordinate i: ', x_filt_all_p
print 'full absolute difference', y_filt_all_p
print 'full value of coordinate p: ', los_p_filt_all_p
#trying to plot the "recurrence plots" together with the envelope.
pp.subplot(211)
pp.plot(arange(data.size),data, color='c',label='Time Signal 2')
pp.legend(fontsize= 'small')
pp.grid(True)
pp.xlabel('Time (s)')
pp.ylabel('Amplitude')
#pp.xlim([0,3])
pp.subplot(212)
base='test_plot'
pp.title('Recurrence plot delta=')
markerline2, stemlines2, baseline2 = stem(x_filt_all_p*float(1)/float(w[0]), los_p_filt_all_p*float(1)/float(w[0]),'b',linefmt=" ",)
pp.matplotlib.markers.MarkerStyle('.')
setp(markerline2,'markerfacecolor','b',label='points')
pp.legend(fontsize= 'small')
pp.grid(True)
pp.xlabel('Time i [s]')
pp.ylabel('Time p [s]')
#pp.xlim(0,3)
#pp.ylim(0,3)
pp.show()
#pp.savefig('plots/%s.jpg' %(str(base))
pp.close()
But I am not sure 100% that my code I working ok. Could someone take a look to my code to give me some advise of how to test it? I don't want to use neither matlab nor mathematica. The idea was to create an independent code in python. Also I have another smaller problem, I could not change the dot size in my plot. Finally I alrady try to use crosscheck with http://recurrence-plot.tk/online/index.php?state= butt I couldn't make it work. Any suggestion on my code or possible crosscheck wold be very welcome. Thanks in advance
回答1:
I understand this question is quite old, but maybe someone will stumble on this in future.
Since you are already using NumPy let me suggest this snippet:
import numpy as np
def rec_plot(s, eps=0.1, steps=10):
N = s.size
S = np.repeat(s[None,:], N, axis=0)
Z = np.floor(np.abs(S-S.T)/eps)
Z[Z>steps] = steps
return Z
It initially creates square empty array of (N, N) size. Then it subtract all possible combinations of points via S-S.T
, which is implicitly equivalent to having matrix subtraction where one matrix has all rows S and the other all columns S.
Dividing by eps
and flooring is for a short hand for asking of how many eps difference is between those points. Then Z[Z>steps]
is bounding, so that whenever something is more than steps
times eps
from the point, then it's max and will be simply plot with same value.
This solution is suboptimal as it first creates two NxN matrices, which for large N is too much. For N>10000 this is definitely not good. Since you are using SciPy we can use its distance
library. Below is more optimal implementation:
import numpy as np
from scipy.spatial.distance import pdist, squareform
def rec_plot(s, eps=0.1, steps=10):
d = pdist(s[:,None])
d = np.floor(d/eps)
d[d>steps] = steps
Z = squareform(d)
return Z
Examples of usage you can find https://laszukdawid.com/tag/recurrence-plot/ or https://github.com/laszukdawid/recurrence-plot.
来源:https://stackoverflow.com/questions/33650371/recurrence-plot-in-python