问题
I am trying to cluster meteorological stations using R. Stations provide such data as temperature, wind speed, humidity and some more on hourly intervals. I can easily cluster univariate time series using tsclust library, but when I cluster multivariate series I get errors.
I have data as a list so each list element is a matrix with time series data of one station (variables are columns and rows are different timestamp).
If I run:
tsclust(data, k = 2,
distance = 'Euclidean', seed = 3247, trace = TRUE)
I get error: Error in do.call(.External, c(list(CFUN, x, y, pairwise, if (!is.function(method)) get(method) else method), : not a scalar return value
The same error I get if I try to calculate only distance matrix using
dist(data, method="euclidean")
Maybe Euclidean distance can not be calculated for such data? If yes, then what distances could be calculated?
回答1:
You supposedly can still use Euclidean.
You just have to implement it yourself, because the standard method only works for vectors, not for matrixes. But that should be trivial to implement yourself.
You'll likely run into scaling problems though if your variables have different units and magnitudes.
回答2:
If your series have the same length,
you could just transform them into a vector and then re-adjust dimensions.
However, like Anony-Mousse mentioned,
using Euclidean distance with variables that have different scales could be problematic,
so considering normalizing with zscore
:
series <- zscore(data)
pc <- tsclust(lapply(series, as.vector), distance="Euclidean", seed=3247L, trace=TRUE)
pc@datalist <- series
# replace ncol with the actual number of columns from your data
pc@centroids <- lapply(pc@centroids, matrix, ncol=3L)
来源:https://stackoverflow.com/questions/55841627/clustering-multivariate-time-series-question-regarding-distance-matrix