Split a vector into chunks

后端 未结 20 1899
时光说笑
时光说笑 2020-11-22 01:10

I have to split a vector into n chunks of equal size in R. I couldn\'t find any base function to do that. Also Google didn\'t get me anywhere. Here is what I came up with so

相关标签:
20条回答
  • 2020-11-22 01:58

    A few more variants to the pile...

    > x <- 1:10
    > n <- 3
    

    Note, that you don't need to use the factor function here, but you still want to sort o/w your first vector would be 1 2 3 10:

    > chunk <- function(x, n) split(x, sort(rank(x) %% n))
    > chunk(x,n)
    $`0`
    [1] 1 2 3
    $`1`
    [1] 4 5 6 7
    $`2`
    [1]  8  9 10
    

    Or you can assign character indices, vice the numbers in left ticks above:

    > my.chunk <- function(x, n) split(x, sort(rep(letters[1:n], each=n, len=length(x))))
    > my.chunk(x, n)
    $a
    [1] 1 2 3 4
    $b
    [1] 5 6 7
    $c
    [1]  8  9 10
    

    Or you can use plainword names stored in a vector. Note that using sort to get consecutive values in x alphabetizes the labels:

    > my.other.chunk <- function(x, n) split(x, sort(rep(c("tom", "dick", "harry"), each=n, len=length(x))))
    > my.other.chunk(x, n)
    $dick
    [1] 1 2 3
    $harry
    [1] 4 5 6
    $tom
    [1]  7  8  9 10
    
    0 讨论(0)
  • 2020-11-22 01:58

    split(x,matrix(1:n,n,length(x))[1:length(x)])

    perhaps this is more clear, but the same idea:
    split(x,rep(1:n, ceiling(length(x)/n),length.out = length(x)))

    if you want it ordered,throw a sort around it

    0 讨论(0)
  • 2020-11-22 02:00

    I need a function that takes the argument of a data.table (in quotes) and another argument that is the upper limit on the number of rows in the subsets of that original data.table. This function produces whatever number of data.tables that upper limit allows for:

    library(data.table)    
    split_dt <- function(x,y) 
        {
        for(i in seq(from=1,to=nrow(get(x)),by=y)) 
            {df_ <<- get(x)[i:(i + y)];
                assign(paste0("df_",i),df_,inherits=TRUE)}
        rm(df_,inherits=TRUE)
        }
    

    This function gives me a series of data.tables named df_[number] with the starting row from the original data.table in the name. The last data.table can be short and filled with NAs so you have to subset that back to whatever data is left. This type of function is useful because certain GIS software have limits on how many address pins you can import, for example. So slicing up data.tables into smaller chunks may not be recommended, but it may not be avoidable.

    0 讨论(0)
  • 2020-11-22 02:01
    chunk2 <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE)) 
    
    0 讨论(0)
  • 2020-11-22 02:01

    If you don't like split() and you don't like matrix() (with its dangling NAs), there's this:

    chunk <- function(x, n) (mapply(function(a, b) (x[a:b]), seq.int(from=1, to=length(x), by=n), pmin(seq.int(from=1, to=length(x), by=n)+(n-1), length(x)), SIMPLIFY=FALSE))
    

    Like split(), it returns a list, but it doesn't waste time or space with labels, so it may be more performant.

    0 讨论(0)
  • 2020-11-22 02:02

    Simple function for splitting a vector by simply using indexes - no need to over complicate this

    vsplit <- function(v, n) {
        l = length(v)
        r = l/n
        return(lapply(1:n, function(i) {
            s = max(1, round(r*(i-1))+1)
            e = min(l, round(r*i))
            return(v[s:e])
        }))
    }
    
    0 讨论(0)
提交回复
热议问题