I have a vector, such as c(1, 3, 4, 5, 9, 10, 17, 29, 30)
and I would like to group together the 'neighboring' elements that form a regular, consecutive sequence in a ragged vector resulting in:
L1: 1
L2: 3,4,5
L3: 9,10
L4: 17
L5: 29,30
Naive code (of an ex-C programmer):
partition.neighbors <- function(v)
{
result <<- list() #jagged array
currentList <<- v[1] #current series
for(i in 2:length(v))
{
if(v[i] - v [i-1] == 1)
{
currentList <<- c(currentList, v[i])
}
else
{
result <<- c(result, list(currentList))
currentList <<- v[i] #next series
}
}
return(result)
}
Now I understand that
a) R is not C (despite the curly brackets)
b) global variables are pure evil
c) that is a horribly inefficient way of achieving the result
, so any better solutions are welcome.
Making heavy use of some R idioms:
> split(v, cumsum(c(1, diff(v) != 1)))
$`1`
[1] 1
$`2`
[1] 3 4 5
$`3`
[1] 9 10
$`4`
[1] 17
$`5`
[1] 29 30
daroczig writes "you could write a lot neater code based on diff
"...
Here's one way:
split(v, cumsum(diff(c(-Inf, v)) != 1))
EDIT (added timings):
Tommy discovered this could be faster by being careful with types; the reason it got faster is that split
is faster on integers, and is actually faster still on factors.
Here's Joshua's solution; the result from the cumsum
is a numeric because it's being c
'd with 1
, so it's the slowest.
system.time({
a <- cumsum(c(1, diff(v) != 1))
split(v, a)
})
# user system elapsed
# 1.839 0.004 1.848
Just c
ing with 1L
so the result is an integer speeds it up considerably.
system.time({
a <- cumsum(c(1L, diff(v) != 1))
split(v, a)
})
# user system elapsed
# 0.744 0.000 0.746
This is Tommy's solution, for reference; it's also splitting on an integer.
> system.time({
a <- cumsum(c(TRUE, diff(v) != 1L))
split(v, a)
})
# user system elapsed
# 0.742 0.000 0.746
Here's my original solution; it also is splitting on an integer.
system.time({
a <- cumsum(diff(c(-Inf, v)) != 1)
split(v, a)
})
# user system elapsed
# 0.750 0.000 0.754
Here's Joshua's, with the result converted to an integer before the split
.
system.time({
a <- cumsum(c(1, diff(v) != 1))
a <- as.integer(a)
split(v, a)
})
# user system elapsed
# 0.736 0.002 0.740
All the versions that split
on an integer vector are about the same; it could be even faster if that integer vector was already a factor, as the conversion from integer to factor actually takes about half the time. Here I make it into a factor directly; this is not recommended in general because it depends on the structure of the factor class. It'ss done here for comparison purposes only.
system.time({
a <- cumsum(c(1L, diff(v) != 1))
a <- structure(a, class = "factor", levels = 1L:a[length(a)])
split(v,a)
})
# user system elapsed
# 0.356 0.000 0.357
Joshua and Aaron were spot on. However, their code can still be made more than twice as fast by careful use of the correct types, integers and logicals:
split(v, cumsum(c(TRUE, diff(v) != 1L)))
v <- rep(c(1:5, 19), len = 1e6) # Huge vector...
system.time( split(v, cumsum(c(1, diff(v) != 1))) ) # Joshua's code
# user system elapsed
# 2.64 0.00 2.64
system.time( split(v, cumsum(c(TRUE, diff(v) != 1L))) ) # Modified code
# user system elapsed
# 1.09 0.00 1.12
You can create a data.frame
and assign the elements to groups using diff
, ifelse
and cumsum
, then aggregate using tapply
:
v.df <- data.frame(v = v)
v.df$group <- cumsum(ifelse(c(1, diff(v) - 1), 1, 0))
tapply(v.df$v, v.df$group, function(x) x)
$`1`
[1] 1
$`2`
[1] 3 4 5
$`3`
[1] 9 10
$`4`
[1] 17
$`5`
[1] 29 30
You could define the cut-points easily:
which(diff(v) != 1)
Based on that try:
v <- c(1,3,4,5,9,10,17,29,30)
cutpoints <- c(0, which(diff(v) != 1), length(v))
ragged.vector <- vector("list", length(cutpoints)-1)
for (i in 2:length(cutpoints)) ragged.vector[[i-1]] <- v[(cutpoints[i-1]+1):cutpoints[i]]
Which results in:
> ragged.vector
[[1]]
[1] 1
[[2]]
[1] 3 4 5
[[3]]
[1] 9 10
[[4]]
[1] 17
[[5]]
[1] 29 30
This algorithm is not a nice one but you could write a lot neater code based on diff
:) Good luck!
来源:https://stackoverflow.com/questions/5222061/how-to-partition-a-vector-into-groups-of-regular-consecutive-sequences