问题
According to a paper, it is supposed to work. But as a learner of scikit-learn package.. I do not see how. All the sample codes cluster by ellipses or circles as here.
I would really like to know how to cluster the following plot by different patterns... 0 -3 are the mean of power over certain time periods (divided into 4) while 4, 5, 6 each correspond to standard deviation of the year, variance in weekday/weekend, variance in winter/summer. So the ylabel does not necessarily meet with 4,5,6.
Following the sample..BIC did generate that the optimal number of clusters is 5.
n_components = np.arange(1, 21)
models = [GMM(n, covariance_type='full', random_state=0).fit(input)
for n in n_components]
plt.plot(n_comp, [m.bic(read) for m in models], label = 'BIC')
plt.legend(loc='best')
plt.xlabel('n_components')
If I plot with the sample code available however.. it returns something completely weird, not worth sharing. I though negative BIC was ok. But I don't even know if it clustered correctly to deduce that 5 is the optimal number.
回答1:
Basically in an effort to close this question..my following post answers how to cluster using GMM.
Create a model using the parameters accordingly
gmm = GaussianMixture(n_components=10, covariance_type ='full', \
init_params = 'random', max_iter = 100, random_state=0)
Fit your data (number of samples x number of attributes) whose name is input in my case
gmm.fit(input)
print(gmm.means_.round(2))
cluster = gmm.predict(input)
Cluster contains the labels to each of these samples of my input
Feel free to add, if I've gotten anything wrong
来源:https://stackoverflow.com/questions/49229504/gmm-em-on-time-series-cluster