发表新帖

发表新帖

Split data.table into roughly equal parts

前端未结

关注

 4  1071

抹茶落季 2021-01-20 13:59

To parallelize a task, I need to split a big data.table to roughly equal parts, keeping together groups deinfed by a column, id. Suppose:

N

4条回答

暖寄归人 (楼主)

2021-01-20 14:25
If k is big enough, you can use this idea to split data into groups:

First, lets find size for each of ids
```
group_sizes <- dt[, .N, by = id]
```
Then create 2 empty lists with length of M for detecting size of groups and which ids they would contain
```
grps_vals <- list()
grps_vals[1 : M] <- c(0)

grps_nms <- list()
grps_nms[1 : M] <- c(0)
```
(Here I specially added zero values to be able to create list of size M)

Then using loop on every iteration add values to the smallest group. It will make groups roughly equal
```
for ( i in 1:nrow(group_sizes)){
   sums <- sapply(groups, sum) 
   idx <- which(sums == min(sums))[1]
   groups[[idx]] <- c(groups[[idx]], group_sizes$N[i])
   }
```
Finally, delete first zero element from list of names :)
```
grps_nms <- lapply(grps_nms, function(x){x[-1]})

> grps_nms
[[1]]
[1] "a" "d" "f"

[[2]]
[1] "b"

[[3]]
[1] "c" "e"
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题