Cylindrical Clustering in R - clustering timestamp with other data

后端 未结 2 1168
悲哀的现实
悲哀的现实 2021-01-14 17:09

I\'m learning R and I have to cluster numeric data with a timestamp field. One of the parameters is a time, and since the data is strictly day-night dependent, I want to ta

相关标签:
2条回答
  • 2021-01-14 17:19

    Here is such a mapping of h to m where h is the time in hours (and fraction of an hour). Then we try kmeans and at least in this test it seems to work:

    h <- c(22, 23, 0, 1, 2, 10, 11, 12)
    ha <- 2*pi*h/24
    m <- cbind(x = sin(ha), y = cos(ha))
    
    kmeans(m, 2)$cluster # compute cluster assignments via kmeans
    ## [1] 2 2 2 2 2 1 1 1
    
    0 讨论(0)
  • 2021-01-14 17:32

    k-means should use squared Euclidean distance.

    But indeed: projecting your data into a meaningful Euclidean space is an easy way to avoid this kind of problems.

    However be aware that your mean will no longer lie on the cylinder. In many cases, you can just scale the mean to the desired cylinder. But it might become 0, then no meaningful rescaling is possible.

    The other option is kernel k-means. As your desired distance is Euclidean after a data transformation, you can also "kernelize" this transformation, and use kernel k-means. But it may actually be faster to transform your data in your particular case. It will likely only pay off when using much more complex transformations (say, to an infinite dimensional vector space).

    0 讨论(0)
提交回复
热议问题