Right now, I have the following data.frame which was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n))
.
DF <- st
This is another approach, assuming that each category (of the top 5 at least) only occurs once:
df %.%
arrange(desc(n)) %.% #you could skip this step since you arranged the input df already according to your question
mutate(Category = ifelse(1:n() > 5, "Other", Category)) %.%
group_by(Category) %.%
summarize(n = sum(n))
# Category n
#1 E 163051
#2 I 49701
#3 K 127133
#4 L 64868
#5 M 106680
#6 Other 217022
Edit:
I just noticed that my output is not order by decreasing n
any more. After running the code again, I found out that the order is kept until after the group_by(Category)
but when I run the summarize
afterwards, the order is gone (or rather, it seems to be ordered by Category
). Is that supposed to be like that?
Here are three more ways:
m <- 5 #number of top results to show in final table (excl. "Other")
n <- m+1
#preserves the order (or better: reesatblishes it by index)
df <- arrange(df, desc(n)) %.% #this could be skipped if data already ordered
mutate(idx = 1:n(), Category = ifelse(idx > m, "Other", Category)) %.%
group_by(Category) %.%
summarize(n = sum(n), idx = first(idx)) %.%
arrange(idx) %.%
select(-idx)
#doesnt preserve the order (same result as in first dplyr solution, ordered by Category)
df[order(df$n, decreasing=T),] #this could be skipped if data already ordered
df[n:nrow(df),1] <- "Other"
df <- aggregate(n ~ Category, data = df, FUN = "sum")
#preserves the order (without extra index)
df[order(df$n, decreasing=T),] #this could be skipped if data already ordered
df[n:nrow(df),1] <- "Other"
df[n,2] <- sum(df$n[df$Category == "Other"])
df <- df[1:n,]