问题
I have a data frame 'test' that look like this:
session_id seller_feedback_score
1 1 282470
2 1 275258
3 1 275258
4 1 275258
5 1 37831
6 1 282470
7 1 26
8 1 138351
9 1 321350
10 1 841
11 1 138351
12 1 17263
13 1 282470
14 1 396900
15 1 282470
16 1 282470
17 1 321350
18 1 321350
19 1 321350
20 1 0
21 1 1596
22 7 282505
23 7 275283
24 7 275283
25 7 275283
26 7 37834
27 7 282505
28 7 26
29 7 138359
30 7 321360
and a code (using package dplyr) that apparently should rank the 'seller_feedback_score' within each group of session_id:
test <- test %>% group_by(session_id) %>%
mutate(seller_feedback_score_rank = dense_rank(-seller_feedback_score))
however, what is really happening is that R rank the entire data frame together without relating to the groups (session_id's):
session_id seller_feedback_score seller_feedback_score_rank_2
1 1 282470 5
2 1 275258 7
3 1 275258 7
4 1 275258 7
5 1 37831 11
6 1 282470 5
7 1 26 15
8 1 138351 9
9 1 321350 3
10 1 841 14
11 1 138351 9
12 1 17263 12
13 1 282470 5
14 1 396900 1
15 1 282470 5
16 1 282470 5
17 1 321350 3
18 1 321350 3
19 1 321350 3
20 1 0 16
21 1 1596 13
22 7 282505 4
23 7 275283 6
24 7 275283 6
25 7 275283 6
26 7 37834 10
27 7 282505 4
28 7 26 15
29 7 138359 8
30 7 321360 2
I checked this by counting the unique 'seller_feedback_score_rank' values and not surprisingly it equals to the highest rank value. I'd appreciate if someone could reproduce and help. thanks
link to my original question: R group by and aggregate - return relative rank within groups using plyr
回答1:
Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by.
# Sample dataset
df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10),
value=as.integer(rnorm(20, mean=1000, sd=500)))
require(dplyr)
print.data.frame(df[0:10,])
group value
1 GROUP 1 1273
2 GROUP 2 1261
3 GROUP 1 1189
4 GROUP 2 1390
5 GROUP 1 1942
6 GROUP 2 1111
7 GROUP 1 530
8 GROUP 2 893
9 GROUP 1 997
10 GROUP 2 237
sorted <- df %>%
arrange(group, -value) %>%
group_by(group) %>%
mutate(rank=row_number())
print.data.frame(sorted)
group value rank
1 GROUP 1 1942 1
2 GROUP 1 1368 2
3 GROUP 1 1273 3
4 GROUP 1 1249 4
5 GROUP 1 1189 5
6 GROUP 1 997 6
7 GROUP 1 562 7
8 GROUP 1 535 8
9 GROUP 1 530 9
10 GROUP 1 1 10
11 GROUP 2 1472 1
12 GROUP 2 1390 2
13 GROUP 2 1281 3
14 GROUP 2 1261 4
15 GROUP 2 1111 5
16 GROUP 2 893 6
17 GROUP 2 774 7
18 GROUP 2 669 8
19 GROUP 2 631 9
20 GROUP 2 237 10
回答2:
Found an answer in : Add a "rank" column to a data frame
data.selected <- transform(data.selected,
seller_feedback_score_rank = ave(seller_feedback_score, session_id,
FUN = function(x) rank(-x, ties.method = "first")))
回答3:
One way you can do this is :
dataset<-dataset%>%arrange(ID, DateTime,Index)
dataset$Rank<-c(0,ID)[-(nrow(dataset)+1)] == ID
dataset<- dataset%>%group_by(ID)%>%mutate(Rank = cumsum(Rank))
Had the same issue!
来源:https://stackoverflow.com/questions/28018933/r-data-frame-rank-by-groups-group-by-rank-with-package-dplyr