问题
I have the data frame "Category", "ID", "Score(t)", and I want to get "Rank(t)":
Category ID Score.08.2007 Score.09.2007 Rank.08.2007 Rank.09.2007 ...
Orange FSGBR070N3 0.16 ... 5 ...
Orange FSGBR070N3 0.05 ... 7 ...
Orange FSGBR070N3 0.11 6
Orange FS00008L4G 0.28 1
Orange FS00008VLD 0.27 2
Orange FS00008VLD 0.27 2
Orange FS00008VLD 0.27 2
Orange FS00009SQX -2.03 8
Orange FS00009SQX NA
Orange FSUSA0A1KW NA
Orange FSUSA0A1KW NA
Orange FSUSA0A1KX NA
Orange FSUSA0A1KY NA
Orange FS0000B389 NA
Banana FS000092GP 96.25 1
Banana FS000092GP 96.25 1
Banana FS000092GP 96.25 1
Banana FS000092GP 52.33 4
Banana FS0000ATLN 31.73 5
Banana FSUSA0AVMF 1.38 7
Banana FSGBR058O8 1.37 8
Banana FSGBR05845 2.24 6
The ranking is based on descending sorting of the "Score" in each "Category". The additional specification, which I struggle to capture, is that when there are identical scores AND identical ID's, for the following score that has a different value assign a rank equal to the rank from the previous ID plus the number of ID's that shared the same score (The rank output column in the example should make this clear).
NA's should receive no ranking:
na.last = NA
I have started with creating a matrix for ranks, then I would probably need sort(), but I struggle to capture this for the time-series and with the additional specification... couldn't find such specific existing questions either. Help appreciated!
time_series <- c("08.2007","09.2007","10.2007",...)
abs_ranks_mat <- as.data.frame(mat.or.vec(nrow(ID),length(time_series)))
回答1:
A solution uses dplyr
. df
is the example from @trosendal's example. df3
is the final output.
The key is to use min_rank
function to create the rank. mutate_at
allows us to specify which column we do or do not want to conduct ranking. After that, we can change the column names and merge with the original data frame.
library(dplyr)
df <- df %>% mutate(RowID = 1:n())
df2 <- df %>%
group_by(Category) %>%
mutate_at(vars(-ID, -RowID), funs(min_rank(desc(.)))) %>%
ungroup() %>%
select(-Category, -ID) %>%
setNames(., gsub("Score", "Rank", colnames(.)))
df3 <- df %>%
left_join(df2, by = "RowID") %>%
select(-RowID)
回答2:
Your data:
df <- structure(list(Category = c("Orange", "Orange", "Orange", "Orange",
"Orange", "Orange", "Orange", "Orange", "Orange", "Orange", "Orange",
"Orange", "Orange", "Orange", "Banana", "Banana", "Banana", "Banana",
"Banana", "Banana", "Banana", "Banana"), ID = c("FSGBR070N3",
"FSGBR070N3", "FSGBR070N3", "FS00008L4G", "FS00008VLD", "FS00008VLD",
"FS00008VLD", "FS00009SQX", "FS00009SQX", "FSUSA0A1KW", "FSUSA0A1KW",
"FSUSA0A1KX", "FSUSA0A1KY", "FS0000B389", "FS000092GP", "FS000092GP",
"FS000092GP", "FS000092GP", "FS0000ATLN", "FSUSA0AVMF", "FSGBR058O8",
"FSGBR05845"), Score.08.2007 = c(0.16, 0.05, 0.11, 0.28, 0.27,
0.27, 0.27, -2.03, NA, NA, NA, NA, NA, NA, 96.25, 96.25, 96.25,
52.33, 31.73, 1.38, 1.37, 2.24), Score.09.2007 = c(0.16, 0.05,
0.14, 0.22, 0.23, 0.27, 0.27, -2.03, NA, NA, 0.14, NA, 0.56,
NA, 96.25, 93.25, 96.25, 51.33, 31.73, 1.38, 1.37, 2.24)), .Names = c("Category",
"ID", "Score.08.2007", "Score.09.2007"), row.names = c(NA, -22L
), class = "data.frame")
Loop over scores and generate the ranks within each category:
for(i in names(df)[grep("Score", names(df))]) {
df[,paste0("rank", i)] <- do.call("c", lapply(unique(df$Category), function(x){
a <- floor(rank(df[df$Category == x, i], na.last = TRUE))
a[is.na(df[df$Category == x, i])] <- NA
a <- max(a, na.rm = TRUE) - a + 1
return(a)
}))
}
df
来源:https://stackoverflow.com/questions/45692427/specific-group-rankings-in-r