UPDATE: I have a data frame \'test\' that look like this:
session_id seller_feedback_score
1 1 282470
2 1 275258
3 1 275258
4 1 275258
5
One option:
library(dplyr)
df %>% group_by(session_id) %>%
mutate(rank = dense_rank(-seller_feedback_score))
dense_rank
is "like min_rank, but with no gaps between ranks" so I negated the seller_feedback_score column in order to turn it into something like max_rank (which doesn't exist in dplyr).
If you want the ranks with gaps so that you reach 21 for the lowest in your case, you can use min_rank
instead of dense_rank
:
library(dplyr)
df %>% group_by(session_id) %>%
mutate(rank = min_rank(-seller_feedback_score))
From data.table 1.9.5
on, frank()
(for fast rank) function is exported. The interface is similar to base::rank
, but it implements dense rank
in addition to all the ranking methods base::rank
provides, and it also works on a list in addition to vectors. You can install it by following the instructions here.
require(data.table) ## 1.9.5+
setDT(df)[,
rank := frank(-seller_feedback_score, ties.method="dense"),
by=session_id]
As @David points out, perhaps what you want is rank = "first"
or "min"?? Not sure...
setDT(df)[,
rank := frank(-seller_feedback_score, ties.method="first"), ## or "min" or "max"
by=session_id]
Anyhow, it must be plentiful fast. Here's a benchmark against base R:
require(data.table)
set.seed(45L)
val = sample(1e4, 1e7, TRUE)
system.time(ans1 <- rank(val, ties.method = "min"))
# user system elapsed
# 16.771 0.199 17.035
system.time(an2 <- frank(val, ties.method = "min"))
# user system elapsed
# 0.532 0.013 0.550
identical(ans1, ans2) # [1] TRUE