Split a vector into chunks

后端 未结 20 1898
时光说笑
时光说笑 2020-11-22 01:10

I have to split a vector into n chunks of equal size in R. I couldn\'t find any base function to do that. Also Google didn\'t get me anywhere. Here is what I came up with so

相关标签:
20条回答
  • 2020-11-22 02:04

    Try the ggplot2 function, cut_number:

    library(ggplot2)
    x <- 1:10
    n <- 3
    cut_number(x, n) # labels = FALSE if you just want an integer result
    #>  [1] [1,4]  [1,4]  [1,4]  [1,4]  (4,7]  (4,7]  (4,7]  (7,10] (7,10] (7,10]
    #> Levels: [1,4] (4,7] (7,10]
    
    # if you want it split into a list:
    split(x, cut_number(x, n))
    #> $`[1,4]`
    #> [1] 1 2 3 4
    #> 
    #> $`(4,7]`
    #> [1] 5 6 7
    #> 
    #> $`(7,10]`
    #> [1]  8  9 10
    
    0 讨论(0)
  • 2020-11-22 02:05

    This will split it differently to what you have, but is still quite a nice list structure I think:

    chunk.2 <- function(x, n, force.number.of.groups = TRUE, len = length(x), groups = trunc(len/n), overflow = len%%n) { 
      if(force.number.of.groups) {
        f1 <- as.character(sort(rep(1:n, groups)))
        f <- as.character(c(f1, rep(n, overflow)))
      } else {
        f1 <- as.character(sort(rep(1:groups, n)))
        f <- as.character(c(f1, rep("overflow", overflow)))
      }
      
      g <- split(x, f)
      
      if(force.number.of.groups) {
        g.names <- names(g)
        g.names.ordered <- as.character(sort(as.numeric(g.names)))
      } else {
        g.names <- names(g[-length(g)])
        g.names.ordered <- as.character(sort(as.numeric(g.names)))
        g.names.ordered <- c(g.names.ordered, "overflow")
      }
      
      return(g[g.names.ordered])
    }
    

    Which will give you the following, depending on how you want it formatted:

    > x <- 1:10; n <- 3
    > chunk.2(x, n, force.number.of.groups = FALSE)
    $`1`
    [1] 1 2 3
    
    $`2`
    [1] 4 5 6
    
    $`3`
    [1] 7 8 9
    
    $overflow
    [1] 10
    
    > chunk.2(x, n, force.number.of.groups = TRUE)
    $`1`
    [1] 1 2 3
    
    $`2`
    [1] 4 5 6
    
    $`3`
    [1]  7  8  9 10
    

    Running a couple of timings using these settings:

    set.seed(42)
    x <- rnorm(1:1e7)
    n <- 3
    

    Then we have the following results:

    > system.time(chunk(x, n)) # your function 
       user  system elapsed 
     29.500   0.620  30.125 
    
    > system.time(chunk.2(x, n, force.number.of.groups = TRUE))
       user  system elapsed 
      5.360   0.300   5.663 
    

    Note: Changing as.factor() to as.character() made my function twice as fast.

    0 讨论(0)
  • 2020-11-22 02:06

    Sorry if this answer comes so late, but maybe it can be useful for someone else. Actually there is a very useful solution to this problem, explained at the end of ?split.

    > testVector <- c(1:10) #I want to divide it into 5 parts
    > VectorList <- split(testVector, 1:5)
    > VectorList
    $`1`
    [1] 1 6
    
    $`2`
    [1] 2 7
    
    $`3`
    [1] 3 8
    
    $`4`
    [1] 4 9
    
    $`5`
    [1]  5 10
    
    0 讨论(0)
  • 2020-11-22 02:07

    This splits into chunks of size ⌊n/k⌋+1 or ⌊n/k⌋ and does not use the O(n log n) sort.

    get_chunk_id<-function(n, k){
        r <- n %% k
        s <- n %/% k
        i<-seq_len(n)
        1 + ifelse (i <= r * (s+1), (i-1) %/% (s+1), r + ((i - r * (s+1)-1) %/% s))
    }
    
    split(1:10, get_chunk_id(10,3))
    
    0 讨论(0)
  • 2020-11-22 02:10

    If you don't like split() and you don't mind NAs padding out your short tail:

    chunk <- function(x, n) { if((length(x)%%n)==0) {return(matrix(x, nrow=n))} else {return(matrix(append(x, rep(NA, n-(length(x)%%n))), nrow=n))} }
    

    The columns of the returned matrix ([,1:ncol]) are the droids you are looking for.

    0 讨论(0)
  • 2020-11-22 02:12

    You could combine the split/cut, as suggested by mdsummer, with quantile to create even groups:

    split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))
    

    This gives the same result for your example, but not for skewed variables.

    0 讨论(0)
提交回复
热议问题