Rank based on several variables

前端 未结 2 1786
日久生厌
日久生厌 2021-01-19 00:15

This is a small example. In my larger dataset, I have multiple years of data and the number of observations per group (div) are not always equal.

Example data:

相关标签:
2条回答
  • 2021-01-19 00:58

    This is how I'd do it:

    library(data.table)
    dt = as.data.table(df)
    
    dt[order(-pts, -x), rank.init := 1:.N, by = div]
    
    dt[, div.clean := sub('(\\d+).*', '\\1', div)]
    setorder(dt, div.clean, rank.init)
    
    dt[, rank.final := mean(.I), by = .(div.clean, rank.init)]
    setorder(dt, div, rank.final)
    #    year id div pts  x rank.init div.clean rank.final
    # 1: 2014  N   1   9 11         1         1        1.0
    # 2: 2014  G   1   9 10         2         1        2.0
    # 3: 2014  J   1   7 12         3         1        3.0
    # 4: 2014  U   1   3  7         4         1        4.0
    # 5: 2014  M  2a   7 12         1         2        5.5
    # 6: 2014  E  2a   7  7         2         2        7.5
    # 7: 2014  S  2a   5  5         3         2        9.5
    # 8: 2014  W  2a   3  4         4         2       11.5
    # 9: 2014  D  2b   7  7         1         2        5.5
    #10: 2014  B  2b   7  6         2         2        7.5
    #11: 2014  L  2b   2  4         3         2        9.5
    #12: 2014  C  2b   1  2         4         2       11.5
    
    0 讨论(0)
  • 2021-01-19 00:58

    @eddi's answer is already very nice. I just wanted to illustrate the same using frank() function from the development version of data.table, v1.9.5, which can compute ranks on vectors, lists, data.frames or data.tables.

    # from @eddi's
    setDT(df)[, div.clean := sub('(\\d+).*', '\\1', div)]
    
    df[, position := frank(.SD, -pts, -x, ties.method="first"), by=div]
    df[, final := frank(.SD, div.clean, position, ties.method="average")]
    

    This also retains the original order, if that's of any importance.

    I'll leave the conversion to dplyr to you.

    0 讨论(0)
提交回复
热议问题