Finding the right package in R for cluster analysis

萝らか妹 提交于 2021-01-28 12:19:34

问题


I'm trying to find a package in R where I can find clusters that exceed a given threshold in a dataset.

What I want to know is the the cluster duration/size and the individual values of each cluster.

For example (a simple one):

I have a vector of data,

10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15

The clusters that are larger than 9 are defined in bold,

10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15

So here the cluster sizes in order are,

1, 2, 1, 2, 4, 1

What I want R to do is return the clusters in separate ordered vectors, e.g.

[1] 10
[2] 14 14
[3] 14
[4] 11 12
[5] 11 11 16 20
[6] 15

Is there such a package or also a piece of code with if statements for example would also help.

Cheers


回答1:


The data.table::rleid function works well for this:

Filter(function(a) a[1] > 9, split(vec, data.table::rleid(vec > 9)))
# $`1`
# [1] 10
# $`3`
# [1] 14 14
# $`5`
# [1] 14
# $`7`
# [1] 11 12
# $`9`
# [1] 11 11 16 20
# $`11`
# [1] 15

If you'd prefer to not load the data.table package just for that, then a base-R approach from https://stackoverflow.com/a/33509966:

myrleid <- function(x) {
  rl <- rle(x)$lengths
  rep(seq_along(rl), times = rl)
}
Filter(function(a) a[1] > 9, split(vec, myrleid(vec > 9)))


来源:https://stackoverflow.com/questions/62007395/finding-the-right-package-in-r-for-cluster-analysis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!