Splitting a large data frame into smaller segments

匿名 (未验证) 提交于 2019-12-03 01:56:01

问题:

I have the following data frame and I want to break it up into 10 different data frames. I want to break the initial 100 row data frame into 10 data frames of 10 rows. I could do the following and get the desired results.

df = data.frame(one=c(rnorm(100)), two=c(rnorm(100)), three=c(rnorm(100)))  df1 = df[1:10,] df2 = df[11:20,] df3 = df[21:30,] df4 = df[31:40,] df5 = df[41:50,] ... 

Of course, this isn't an elegant way to perform this task when the initial data frames are larger or if there aren't an easy number of segments that it can be broken down into.

So given the above, let's say we have the following data frame.

df = data.frame(one=c(rnorm(1123)), two=c(rnorm(1123)), three=c(rnorm(1123))) 

Now I want to split it into new data frames comprised of 200 rows, and the final data frame with the remaining rows. What would be a more elegant (aka 'quick') way to perform this task.

回答1:

 > str(split(df, (as.numeric(rownames(df))-1) %/% 200)) List of 6  $ 0:'data.frame':  200 obs. of  3 variables:   ..$ one  : num [1:200] -1.592 1.664 -1.231 0.269 0.912 ...   ..$ two  : num [1:200] 0.639 -0.525 0.642 1.347 1.142 ...   ..$ three: num [1:200] -0.45 -0.877 0.588 1.188 -1.977 ...  $ 1:'data.frame':  200 obs. of  3 variables:   ..$ one  : num [1:200] -0.0017 1.9534 0.0155 -0.7732 -1.1752 ...   ..$ two  : num [1:200] -0.422 0.869 0.45 -0.111 0.073 ...   ..$ three: num [1:200] -0.2809 1.31908 0.26695 0.00594 -0.25583 ...  $ 2:'data.frame':  200 obs. of  3 variables:   ..$ one  : num [1:200] -1.578 0.433 0.277 1.297 0.838 ...   ..$ two  : num [1:200] 0.913 0.378 0.35 -0.241 0.783 ...   ..$ three: num [1:200] -0.8402 -0.2708 -0.0124 -0.4537 0.4651 ...  $ 3:'data.frame':  200 obs. of  3 variables:   ..$ one  : num [1:200] 1.432 1.657 -0.72 -1.691 0.596 ...   ..$ two  : num [1:200] 0.243 -0.159 -2.163 -1.183 0.632 ...   ..$ three: num [1:200] 0.359 0.476 1.485 0.39 -1.412 ...  $ 4:'data.frame':  200 obs. of  3 variables:   ..$ one  : num [1:200] -1.43 -0.345 -1.206 -0.925 -0.551 ...   ..$ two  : num [1:200] -1.343 1.322 0.208 0.444 -0.861 ...   ..$ three: num [1:200] 0.00807 -0.20209 -0.56865 1.06983 -0.29673 ...  $ 5:'data.frame':  123 obs. of  3 variables:   ..$ one  : num [1:123] -1.269 1.555 -0.19 1.434 -0.889 ...   ..$ two  : num [1:123] 0.558 0.0445 -0.0639 -1.934 -0.8152 ...   ..$ three: num [1:123] -0.0821 0.6745 0.6095 1.387 -0.382 ... 

If some code might have changed the rownames it would be safer to use:

 split(df, (seq(nrow(df))-1) %/% 200)  


回答2:

require(ff) df 


回答3:

If you can generate a vector that defines the groups, you can split anything:

f  df1  lapply(df1,dim) $`1` [1] 200   3  $`2` [1] 200   3  $`3` [1] 200   3  $`4` [1] 200   3  $`5` [1] 200   3  $`6` [1] 123   3 


回答4:

Something like this...?

b 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!