Counting unique / distinct values by group in a data frame

前端 未结 11 2142
终归单人心
终归单人心 2020-11-22 00:12

Let\'s say I have the following data frame:

> myvec
    name order_no
1    Amy       12
2   Jack       14
3   Jack       16
4   Dave       11
5    Amy             


        
相关标签:
11条回答
  • 2020-11-22 00:24

    This should do the trick:

    ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))
    

    This requires package plyr.

    0 讨论(0)
  • 2020-11-22 00:26

    This would also work but is less eloquent than the plyr solution:

    x <- sapply(split(myvec, myvec$name),  function(x) length(unique(x[, 2]))) 
    data.frame(names=names(x), number_of_distinct_orders=x, row.names = NULL)
    
    0 讨论(0)
  • 2020-11-22 00:27

    This is a simple solution with the function aggregate:

    aggregate(order_no ~ name, myvec, function(x) length(unique(x)))
    
    0 讨论(0)
  • 2020-11-22 00:28

    In dplyr you may use n_distinct to "count the number of unique values":

    library(dplyr)
    myvec %>%
      group_by(name) %>%
      summarise(n_distinct(order_no))
    
    0 讨论(0)
  • 2020-11-22 00:28

    Here is a solution with sqldf

    library("sqldf")
    
    myvec <- read.table(header=TRUE, text=
    "   name order_no
    1    Amy       12
    2   Jack       14
    3   Jack       16
    4   Dave       11
    5    Amy       12
    6   Jack       16
    7    Tom       19
    8  Larry       22
    9    Tom       19
    10  Dave       11
    11  Jack       17
    12   Tom       20
    13   Amy       23
    14  Jack       16")
    sqldf("SELECT name,COUNT(distinct(order_no)) as number_of_distinct_orders FROM myvec GROUP BY name")
    # > sqldf("SELECT name,COUNT(distinct(order_no)) as number_of_distinct_orders FROM myvec GROUP BY name")
    #    name number_of_distinct_orders
    # 1   Amy                         2
    # 2  Dave                         1
    # 3  Jack                         3
    # 4 Larry                         1
    # 5   Tom                         2
    
    0 讨论(0)
  • 2020-11-22 00:31

    You can just use the built-in R functions tapply with length

    tapply(myvec$order_no, myvec$name, FUN = function(x) length(unique(x)))
    
    0 讨论(0)
提交回复
热议问题