问题
I'm trying to find a package in R where I can find clusters that exceed a given threshold in a dataset.
What I want to know is the the cluster duration/size and the individual values of each cluster.
For example (a simple one):
I have a vector of data,
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
The clusters that are larger than 9 are defined in bold,
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
So here the cluster sizes in order are,
1, 2, 1, 2, 4, 1
What I want R to do is return the clusters in separate ordered vectors, e.g.
[1] 10
[2] 14 14
[3] 14
[4] 11 12
[5] 11 11 16 20
[6] 15
Is there such a package or also a piece of code with if statements for example would also help.
Cheers
回答1:
The data.table::rleid
function works well for this:
Filter(function(a) a[1] > 9, split(vec, data.table::rleid(vec > 9)))
# $`1`
# [1] 10
# $`3`
# [1] 14 14
# $`5`
# [1] 14
# $`7`
# [1] 11 12
# $`9`
# [1] 11 11 16 20
# $`11`
# [1] 15
If you'd prefer to not load the data.table
package just for that, then a base-R approach from https://stackoverflow.com/a/33509966:
myrleid <- function(x) {
rl <- rle(x)$lengths
rep(seq_along(rl), times = rl)
}
Filter(function(a) a[1] > 9, split(vec, myrleid(vec > 9)))
来源:https://stackoverflow.com/questions/62007395/finding-the-right-package-in-r-for-cluster-analysis