How to find number of unique ids corresponding to each date in a data drame

前端 未结 3 772
一整个雨季
一整个雨季 2021-01-24 07:28

I have a data frame that looks like this:

      date         time              id            datetime    
1 2015-01-02 14:27:22.130 999000000007628 2015-01-02 14         


        
相关标签:
3条回答
  • 2021-01-24 08:11

    This answer is in response to this post: group by and then count unique observations which was marked as duplicate as I was writing this draft. This is not in response to the question for the duplicate basis here: How to find number of unique ids corresponding to each date in a data drame which asks about finding unique ID's. I'm not sure the second post actually answers the OP's question which is,

    "I want to create a table with the number of unique id for each combination of group1 and group2."

    The keyword here is 'combination'. The interpretation is each id has a particular value for group1 and a particular value for group2 so that the set of data of interest is the particular set of values c(id, group1, group2).

    Here is the data.frame the OP provided:

    df1 <- data.frame(id=sample(letters, 10000, replace = T),
    group1=sample(1:2, 10000, replace = T),
    group2=sample(100:101, 10000, replace = T))
    

    Using data.table inspired by this post -- https://stackoverflow.com/a/13017723/5220858:

    >library(data.table)
    >DT <- data.table(df1)
    >DT[, .N, by = .(group1, group2)]
    
       group1 group2    N
    1:      1    100 2493
    2:      1    101 2455
    3:      2    100 2559
    4:      2    101 2493
    

    N is the count for the id that has a particular group1 value and a particular group2 value. Expanding to include the id also returns a table of 104 unique id, group1, group2 combinations.

    >DT[, .N, by = .(id, group1, group2)]
    
         id group1 group2   N
      1:  t      1    100 107
      2:  g      1    101  85
      3:  l      1    101  98
      4:  a      1    100  83
      5:  j      1    101  98
     ---                     
    100:  p      1    101  96
    101:  r      2    101  91
    102:  y      1    101 104
    103:  g      1    100  83
    104:  r      2    100  77
    
    0 讨论(0)
  • 2021-01-24 08:25

    You can use the uniqueN function from data.table:

    library(data.table)
    setDT(df)[, uniqueN(id), by = date]
    

    or (as per the comment of @Richard Scriven):

    aggregate(id ~ date, df, function(x) length(unique(x)))
    
    0 讨论(0)
  • 2021-01-24 08:26

    Or we could use n_distinct from library(dplyr)

    library(dplyr) 
    df %>%
       group_by(date) %>%
       summarise(id=n_distinct(id))
    
    0 讨论(0)
提交回复
热议问题