Collapse continuous integer runs to strings of ranges

后端未结

关注

 6  535

I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle but don\'t know how to use it here).

It\'s easier to l

相关标签:

6条回答

情深已故

2020-11-30 14:03

Late to the party, but here's a deparse based one-liner:

lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

0 讨论(0)

夕颜

2020-11-30 14:05

Using IRanges:

require(IRanges)
lapply(z, function(x) {
    t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
    apply(t, 1, function(x) paste(unique(x), collapse=":"))
})

# $greg
# [1] "7:11"  "20:24" "30:33" "49"   
# 
# $researcher
# [1] "42:48"
# 
# $sally
# [1] "25:29" "37:41"
# 
# $sam
# [1] "1:6"   "16:19" "34:36"
# 
# $teacher
# [1] "12:15"

0 讨论(0)

轻奢々

2020-11-30 14:11

Here is an attempt using diff and tapply returning a character vector

runs <- lapply(z, function(x) {
  z <- which(diff(x)!=1); 
  results <- x[sort(unique(c(1,length(x), z,z+1)))]
  lr <- length(results)
  collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
  as.vector(tapply(results, collapse, paste, collapse = ':'))
  })

runs
$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

0 讨论(0)

臣服心动

2020-11-30 14:11

Another short solution with lapply and tapply:

lapply(z, function(x)
  unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y) 
    paste(unique(range(y)), collapse = ":")
  ))
)

The result:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

0 讨论(0)

攒了一身酷

2020-11-30 14:12

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
  diffs <- c(1, diff(x))
  start_indexes <- c(1, which(diffs > 1))
  end_indexes <- c(start_indexes - 1, length(x))
  coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
  paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

0 讨论(0)

臣服心动

2020-11-30 14:24

I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unname(sapply(difflist, function(x){
    if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
  }))
}

lapply(z, findIntRuns)

Which produces:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

0 讨论(0)