R group by and aggregate - return relative rank within groups using plyr

后端 未结 2 531
时光说笑
时光说笑 2021-01-16 22:50

UPDATE: I have a data frame \'test\' that look like this:

    session_id  seller_feedback_score
1   1   282470
2   1   275258
3   1   275258
4   1   275258
5         


        
2条回答
  •  花落未央
    2021-01-16 23:22

    From data.table 1.9.5 on, frank() (for fast rank) function is exported. The interface is similar to base::rank, but it implements dense rank in addition to all the ranking methods base::rank provides, and it also works on a list in addition to vectors. You can install it by following the instructions here.

    require(data.table) ## 1.9.5+
    setDT(df)[, 
        rank := frank(-seller_feedback_score, ties.method="dense"), 
    by=session_id]
    

    As @David points out, perhaps what you want is rank = "first" or "min"?? Not sure...

    setDT(df)[, 
        rank := frank(-seller_feedback_score, ties.method="first"), ## or "min" or "max"
    by=session_id]
    

    Anyhow, it must be plentiful fast. Here's a benchmark against base R:

    require(data.table)
    set.seed(45L)
    val = sample(1e4, 1e7, TRUE)
    system.time(ans1 <- rank(val, ties.method = "min"))
    #    user  system elapsed 
    #  16.771   0.199  17.035 
    system.time(an2 <- frank(val, ties.method = "min"))
    #    user  system elapsed 
    #   0.532   0.013   0.550 
    identical(ans1, ans2) # [1] TRUE
    

提交回复
热议问题