filtering data.frame based on row_number()

后端 未结 3 935
旧巷少年郎
旧巷少年郎 2021-02-01 14:01

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted

I´m trying to get the second to the seventh line in a

相关标签:
3条回答
  • 2021-02-01 14:18

    Here is another way to do row-number based filtering in a pipeline.

        df <- data.frame(id = 1:10, var = runif(10))
    
        df %>% .[2:7,]
    
        > id     var
          2  2 0.28817
          3  3 0.56672
          4  4 0.96610
          5  5 0.74772
          6  6 0.75091
          7  7 0.05165
    
    0 讨论(0)
  • 2021-02-01 14:29

    Actually dplyr's slice function is made for this kind of subsetting:

    df %>% slice(2:7)
    

    (I'm a little late to the party but thought I'd add this for future readers)

    0 讨论(0)
  • 2021-02-01 14:45

    The row_number() function does not simply return the row number of each element and so can't be used like you want:

    • ‘row_number’: equivalent to ‘rank(ties.method = "first")’

    You're not actually saying what you want the row_number of. In your case:

    df %>% filter(row_number(id) <= 7, row_number(id) >= 2)
    

    works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

    > row_number()
    Error in rank(x, ties.method = "first") : 
      argument "x" is missing, with no default
    

    That's your error right there.

    Anyway, that's not the way to select rows.

    You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

    > df %>% "["(.,2:7,)
      id        var
    2  2 0.52352994
    3  3 0.02994982
    4  4 0.90074801
    5  5 0.68935493
    6  6 0.57012344
    7  7 0.01489950
    
    0 讨论(0)
提交回复
热议问题