Split a vector into chunks

后端未结

关注

 20  1955

I have to split a vector into n chunks of equal size in R. I couldn\'t find any base function to do that. Also Google didn\'t get me anywhere. Here is what I came up with so

相关标签:

20条回答

终归单人心

2020-11-22 01:58

A few more variants to the pile...

> x <- 1:10
> n <- 3

Note, that you don't need to use the factor function here, but you still want to sort o/w your first vector would be 1 2 3 10:

> chunk <- function(x, n) split(x, sort(rank(x) %% n))
> chunk(x,n)
$`0`
[1] 1 2 3
$`1`
[1] 4 5 6 7
$`2`
[1]  8  9 10

Or you can assign character indices, vice the numbers in left ticks above:

> my.chunk <- function(x, n) split(x, sort(rep(letters[1:n], each=n, len=length(x))))
> my.chunk(x, n)
$a
[1] 1 2 3 4
$b
[1] 5 6 7
$c
[1]  8  9 10

Or you can use plainword names stored in a vector. Note that using sort to get consecutive values in x alphabetizes the labels:

> my.other.chunk <- function(x, n) split(x, sort(rep(c("tom", "dick", "harry"), each=n, len=length(x))))
> my.other.chunk(x, n)
$dick
[1] 1 2 3
$harry
[1] 4 5 6
$tom
[1]  7  8  9 10

0 讨论(0)

梦如初夏

2020-11-22 01:58

split(x,matrix(1:n,n,length(x))[1:length(x)])

perhaps this is more clear, but the same idea:
split(x,rep(1:n, ceiling(length(x)/n),length.out = length(x)))

if you want it ordered,throw a sort around it

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-11-22 02:00
I need a function that takes the argument of a data.table (in quotes) and another argument that is the upper limit on the number of rows in the subsets of that original data.table. This function produces whatever number of data.tables that upper limit allows for:
```
library(data.table)    
split_dt <- function(x,y) 
    {
    for(i in seq(from=1,to=nrow(get(x)),by=y)) 
        {df_ <<- get(x)[i:(i + y)];
            assign(paste0("df_",i),df_,inherits=TRUE)}
    rm(df_,inherits=TRUE)
    }
```
This function gives me a series of data.tables named df_[number] with the starting row from the original data.table in the name. The last data.table can be short and filled with NAs so you have to subset that back to whatever data is left. This type of function is useful because certain GIS software have limits on how many address pins you can import, for example. So slicing up data.tables into smaller chunks may not be recommended, but it may not be avoidable.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-11-22 02:01
```
chunk2 <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE)) 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-11-22 02:01
If you don't like split() and you don't like matrix() (with its dangling NAs), there's this:
```
chunk <- function(x, n) (mapply(function(a, b) (x[a:b]), seq.int(from=1, to=length(x), by=n), pmin(seq.int(from=1, to=length(x), by=n)+(n-1), length(x)), SIMPLIFY=FALSE))
```
Like split(), it returns a list, but it doesn't waste time or space with labels, so it may be more performant.
0 讨论(0)
发布评论:

提交评论
- 加载中...

眼角桃花

2020-11-22 02:02

Simple function for splitting a vector by simply using indexes - no need to over complicate this

vsplit <- function(v, n) {
    l = length(v)
    r = l/n
    return(lapply(1:n, function(i) {
        s = max(1, round(r*(i-1))+1)
        e = min(l, round(r*i))
        return(v[s:e])
    }))
}

0 讨论(0)