I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search.
You could also use
data2 <- data[data$sum_points == 2500, ]
This will make a dataframe with the values where sum_points = 2500
It gives :
airfoils sum_points field_points init_t contour_t field_t
...
491 5 2500 5625 0.000086 0.004272 6.321774
498 5 2500 5625 0.000087 0.004507 6.325083
504 5 2500 5625 0.000088 0.004370 6.336034
603 5 250 10000 0.000072 0.000525 1.111278
577 5 250 10000 0.000104 0.000559 1.111431
587 5 250 10000 0.000072 0.000528 1.111524
606 5 250 10000 0.000079 0.000538 1.111685
....
> data2 <- data[data$sum_points == 2500, ]
> data2
airfoils sum_points field_points init_t contour_t field_t
108 5 2500 625 0.000082 0.004329 0.733109
106 5 2500 625 0.000102 0.004564 0.733243
117 5 2500 625 0.000087 0.004321 0.733274
112 5 2500 625 0.000081 0.004428 0.733587
subset()
is also useful:
subset(DATAFRAME, COLUMNNAME == "")
For a survey package, maybe the survey
package is pertinent?
http://faculty.washington.edu/tlumley/survey/
If you want to split a dataframe according to values of some variable, I'd suggest using daply()
from the plyr
package.
library(plyr)
x <- daply(df, .(splitting_variable), function(x)return(x))
Now, x
is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.
x$Level1
#or
x[["Level1"]]
I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.
If you want to split by values in one of the columns, you can use lapply
. For instance, to split ChickWeight
into a separate dataset for each chick:
data(ChickWeight)
lapply(unique(ChickWeight$Chick), function(x) ChickWeight[ChickWeight$Chick == x,])
You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.
x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))
gives
$`1`
num let LET
3 3 c C
6 6 f F
10 10 j J
12 12 l L
14 14 n N
15 15 o O
17 17 q Q
18 18 r R
20 20 t T
21 21 u U
22 22 v V
23 23 w W
26 26 z Z
$`2`
num let LET
1 1 a A
2 2 b B
4 4 d D
5 5 e E
7 7 g G
8 8 h H
9 9 i I
11 11 k K
13 13 m M
16 16 p P
19 19 s S
24 24 x X
25 25 y Y
You can also split a data frame based upon an existing column. For example, to create three data frames based on the cyl
column in mtcars
:
split(mtcars,mtcars$cyl)
Splitting the data frame seems counter-productive. Instead, use the split-apply-combine paradigm, e.g., generate some data
df = data.frame(grp=sample(letters, 100, TRUE), x=rnorm(100))
then split only the relevant columns and apply the scale()
function to x in each group, and combine the results (using split<-
or ave
)
df$z = 0
split(df$z, df$grp) = lapply(split(df$x, df$grp), scale)
## alternative: df$z = ave(df$x, df$grp, FUN=scale)
This will be very fast compared to splitting data.frames, and the result remains usable in downstream analysis without iteration. I think the dplyr syntax is
library(dplyr)
df %>% group_by(grp) %>% mutate(z=scale(x))
In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine.