How to split a data frame?

后端 未结 8 2249
臣服心动
臣服心动 2020-11-22 03:08

I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search.

相关标签:
8条回答
  • 2020-11-22 03:13

    You could also use

    data2 <- data[data$sum_points == 2500, ]
    

    This will make a dataframe with the values where sum_points = 2500

    It gives :

    airfoils sum_points field_points   init_t contour_t   field_t
    ...
    491        5       2500         5625 0.000086  0.004272  6.321774
    498        5       2500         5625 0.000087  0.004507  6.325083
    504        5       2500         5625 0.000088  0.004370  6.336034
    603        5        250        10000 0.000072  0.000525  1.111278
    577        5        250        10000 0.000104  0.000559  1.111431
    587        5        250        10000 0.000072  0.000528  1.111524
    606        5        250        10000 0.000079  0.000538  1.111685
    ....
    > data2 <- data[data$sum_points == 2500, ]
    > data2
    airfoils sum_points field_points   init_t contour_t   field_t
    108        5       2500          625 0.000082  0.004329  0.733109
    106        5       2500          625 0.000102  0.004564  0.733243
    117        5       2500          625 0.000087  0.004321  0.733274
    112        5       2500          625 0.000081  0.004428  0.733587
    
    0 讨论(0)
  • 2020-11-22 03:16

    subset() is also useful:

    subset(DATAFRAME, COLUMNNAME == "")
    

    For a survey package, maybe the survey package is pertinent?

    http://faculty.washington.edu/tlumley/survey/

    0 讨论(0)
  • 2020-11-22 03:21

    If you want to split a dataframe according to values of some variable, I'd suggest using daply() from the plyr package.

    library(plyr)
    x <- daply(df, .(splitting_variable), function(x)return(x))
    

    Now, x is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.

    x$Level1
    #or
    x[["Level1"]]
    

    I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.

    0 讨论(0)
  • 2020-11-22 03:23

    If you want to split by values in one of the columns, you can use lapply. For instance, to split ChickWeight into a separate dataset for each chick:

    data(ChickWeight)
    lapply(unique(ChickWeight$Chick), function(x) ChickWeight[ChickWeight$Chick == x,])
    
    0 讨论(0)
  • 2020-11-22 03:24

    You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.

    x = data.frame(num = 1:26, let = letters, LET = LETTERS)
    set.seed(10)
    split(x, sample(rep(1:2, 13)))
    

    gives

    $`1`
       num let LET
    3    3   c   C
    6    6   f   F
    10  10   j   J
    12  12   l   L
    14  14   n   N
    15  15   o   O
    17  17   q   Q
    18  18   r   R
    20  20   t   T
    21  21   u   U
    22  22   v   V
    23  23   w   W
    26  26   z   Z
    
    $`2`
       num let LET
    1    1   a   A
    2    2   b   B
    4    4   d   D
    5    5   e   E
    7    7   g   G
    8    8   h   H
    9    9   i   I
    11  11   k   K
    13  13   m   M
    16  16   p   P
    19  19   s   S
    24  24   x   X
    25  25   y   Y
    

    You can also split a data frame based upon an existing column. For example, to create three data frames based on the cyl column in mtcars:

    split(mtcars,mtcars$cyl)
    
    0 讨论(0)
  • 2020-11-22 03:33

    Splitting the data frame seems counter-productive. Instead, use the split-apply-combine paradigm, e.g., generate some data

    df = data.frame(grp=sample(letters, 100, TRUE), x=rnorm(100))
    

    then split only the relevant columns and apply the scale() function to x in each group, and combine the results (using split<- or ave)

    df$z = 0
    split(df$z, df$grp) = lapply(split(df$x, df$grp), scale)
    ## alternative: df$z = ave(df$x, df$grp, FUN=scale)
    

    This will be very fast compared to splitting data.frames, and the result remains usable in downstream analysis without iteration. I think the dplyr syntax is

    library(dplyr)
    df %>% group_by(grp) %>% mutate(z=scale(x))
    

    In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine.

    0 讨论(0)
提交回复
热议问题