问题
I need to select top two values for each group[yearmonth] value from the following data frame in R. I have already sorted the data by count and yearmonth.How can I achieve that in following data?
yearmonth name count
1 201310 Dovas 5
2 201310 Indulgd 2
3 201310 Justina 1
4 201310 Jolita 1
5 201311 Shahrukh Sheikh 1
6 201311 Dovas 29
7 201311 Justina 13
8 201311 Lina 8
9 201312 sUPERED 7
10 201312 John Hansen 7
11 201312 Lina D. 6
12 201312 joanna1st 5
回答1:
Or using data.table
(mydf
from @jazzurro's post). Some options are
library(data.table)
setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth]
Or
setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,]
Or
setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[
,.SD[1:2], by=yearmonth]
# yearmonth name count
#1: 201310 Dovas 5
#2: 201310 Indulgd 2
#3: 201311 Dovas 29
#4: 201311 Justina 13
#5: 201312 sUPERED 7
#6: 201312 John Hansen 7
回答2:
Here is one way:
library(dplyr)
mydf %>%
group_by(yearmonth) %>%
arrange(desc(count)) %>%
slice(1:2)
# yearmonth name count
#1 201310 Dovas 5
#2 201310 Indulgd 2
#3 201311 Dovas 29
#4 201311 Justina 13
#5 201312 sUPERED 7
#6 201312 John Hansen 7
DATA
mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4),
name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh",
"Dovas", "Justina", "Lina", "sUPERED", "John Hansen",
"Lina D.", "joanna1st"),
count = c(5,2,1,1,1,29,13,8,7,7,6,5),
stringsAsFactors = FALSE)
回答3:
Using base R you could do something like:
# sort the data, skip if already done
df <- df[order(df$yearmonth, df$count, decreasing = TRUE),]
Then, to get the top two elements:
df[ave(df$count, df$yearmonth, FUN = seq_along) <= 2, ]
来源:https://stackoverflow.com/questions/26644994/selecting-top-n-values-within-a-group-in-a-column-using-r