How to add count of unique values by group to R data.frame

后端 未结 3 992
死守一世寂寞
死守一世寂寞 2020-11-22 01:59

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing

相关标签:
3条回答
  • 2020-11-22 02:30

    This can be also achieved in a vectorized without by group operations by combining unique with table or tabulate

    If df$color is factor, then

    Either

    table(unique(df)$color)[as.character(df$color)]
    # black black black green green   red   red  blue  blue  blue 
    #    2     2     2     1     1     2     2     3     3     3 
    

    Or

    tabulate(unique(df)$color)[as.integer(df$color)]
    # [1] 2 2 2 1 1 2 2 3 3 3
    

    If df$color is character then just

    table(unique(df)$color)[df$color]
    

    If df$color is an integer then just

    tabulate(unique(df)$color)[df$color]
    
    0 讨论(0)
  • 2020-11-22 02:31

    Here's a solution with the dplyr package - it has n_distinct() as a wrapper for length(unique()).

    df %>%
      group_by(color) %>%
      mutate(unique_types = n_distinct(type))
    
    0 讨论(0)
  • 2020-11-22 02:42

    Using ave (since you ask for it specifically):

    within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})
    

    Make sure that type is character vector and not factor.


    Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

    require(data.table)
    setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
    # if you don't want df to be modified by reference
    ans = as.data.table(df)[, count := uniqueN(type), by = color]
    

    uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.


    Other solutions:

    Using plyr:

    require(plyr)
    ddply(df, .(color), mutate, count = length(unique(type)))
    

    Using aggregate:

    agg <- aggregate(data=df, type ~ color, function(x) length(unique(x)))
    merge(df, agg, by="color", all=TRUE)
    
    0 讨论(0)
提交回复
热议问题