How do I split a data frame based on range of column values in R?

前端 未结 3 1807
隐瞒了意图╮
隐瞒了意图╮ 2021-01-12 22:28

I have a data set like this:

Users   Age
1        2
2        7
3        10
4        3
5        8
6        20

How do I split this data set i

相关标签:
3条回答
  • 2021-01-12 22:30

    We could also use the between function from the data.table package.

    # Create a data frame
    dat <- data.frame(Users = 1:7, Age = c(2, 7, 10, 3, 8, 12, 15))
    
    # Convert the data frame to data table by reference
    # (data.table is also a data.frame)
    setDT(dat)
    
    # Define a list with the cut pairs
    cuts <- list(c(0, 5), c(6, 10), c(11, 15))
    
    # Cycle through dat and cut it into list of data tables by the values in Age
    # matching the defined cuts
    lapply(X = cuts, function(i) {
      dat[between(x = dat[ , Age], lower = i[1], upper = i[2])]
    })
    

    Output:

    [[1]]
       Users Age
    1:     1   2
    2:     4   3
    
    [[2]]
       Users Age
    1:     2   7
    2:     3  10
    3:     5   8
    
    [[3]]
       Users Age
    1:     6  12
    2:     7  15
    

    Many other things are possible, including doing it by group, data.table is rather flexible.

    0 讨论(0)
  • 2021-01-12 22:38

    You can combine split with cut to do this in a single line of code, avoiding the need to subset with a bunch of different expressions for different data ranges:

    split(dat, cut(dat$Age, c(0, 5, 10, 15), include.lowest=TRUE))
    # $`[0,5]`
    #   Users Age
    # 1     1   2
    # 4     4   3
    # 
    # $`(5,10]`
    #   Users Age
    # 2     2   7
    # 3     3  10
    # 5     5   8
    # 
    # $`(10,15]`
    # [1] Users Age  
    # <0 rows> (or 0-length row.names)
    

    cut splits up data based on the specified break points, and split splits up a data frame based on the provided categories. If you stored the result of this computation into a list called l, you could access the smaller data frames with l[[1]], l[[2]], and l[[3]] or the more verbose:

    l$`[0,5]`
    l$`(5,10]`
    l$`(10, 15]`
    
    0 讨论(0)
  • 2021-01-12 22:50

    First, here's your dataset for my purposes: foo=data.frame(Users=1:6,Age=c(2,7,10,3,8,20))

    Here's your first dataset with ages 0–5: subset(foo,Age<=5&Age>=0)

      Users Age
    1     1   2
    4     4   3
    

    Here's your second with ages 6–10: subset(foo,Age<=10&Age>=6)

      Users Age
    2     2   7
    3     3  10
    5     5   8
    

    Your third (using subset(foo,Age<=15&Age>=11)) is empty – your last Age observation is over 15.

    Note also that fractional ages between 5 and 6 or 10 and 11 (e.g., 5.1, 10.5) would be excluded, as this code matches your question very literally. If you'd want someone with an age less than 6 to go in the first group, just amend that code to subset(foo,Age<6&Age>=0). If you'd prefer a hypothetical person with Age=5.1 in the second group, that group's code would be subset(foo,Age<=10&Age>5).

    0 讨论(0)
提交回复
热议问题