Creating an “other” field

前端 未结 3 652
醉酒成梦
醉酒成梦 2020-12-17 23:27

Right now, I have the following data.frame which was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n)).

DF <- st         


        
3条回答
  •  醉梦人生
    2020-12-18 00:09

    This is another approach, assuming that each category (of the top 5 at least) only occurs once:

    df %.% 
      arrange(desc(n)) %.%       #you could skip this step since you arranged the input df already according to your question
      mutate(Category = ifelse(1:n() > 5, "Other", Category)) %.%
      group_by(Category) %.%
      summarize(n = sum(n))
    
    #  Category      n
    #1        E 163051
    #2        I  49701
    #3        K 127133
    #4        L  64868
    #5        M 106680
    #6    Other 217022
    

    Edit:

    I just noticed that my output is not order by decreasing n any more. After running the code again, I found out that the order is kept until after the group_by(Category) but when I run the summarize afterwards, the order is gone (or rather, it seems to be ordered by Category). Is that supposed to be like that?

    Here are three more ways:

    m <- 5    #number of top results to show in final table (excl. "Other")
    n <- m+1
    
    #preserves the order (or better: reesatblishes it by index)
    df <- arrange(df, desc(n)) %.%    #this could be skipped if data already ordered 
      mutate(idx = 1:n(), Category = ifelse(idx > m, "Other", Category)) %.%
      group_by(Category) %.%
      summarize(n = sum(n), idx = first(idx)) %.%
      arrange(idx) %.%
      select(-idx)
    
    #doesnt preserve the order (same result as in first dplyr solution, ordered by Category)
    df[order(df$n, decreasing=T),]     #this could be skipped if data already ordered 
    df[n:nrow(df),1] <- "Other"
    df <- aggregate(n ~ Category, data = df, FUN = "sum")
    
    #preserves the order (without extra index)
    df[order(df$n, decreasing=T),]     #this could be skipped if data already ordered 
    df[n:nrow(df),1] <- "Other"
    df[n,2] <- sum(df$n[df$Category == "Other"]) 
    df <- df[1:n,]
    

提交回复
热议问题