How to find number of unique ids corresponding to each date in a data drame

前端未结

关注

 3  772

I have a data frame that looks like this:

      date         time              id            datetime    
1 2015-01-02 14:27:22.130 999000000007628 2015-01-02 14


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-01-24 08:11
              
            
            
                                                                       
This answer is in response to this post: group by and then count unique observations which was marked as duplicate as I was writing this draft. This is not in response to the question for the duplicate basis here:  How to find number of unique ids corresponding to each date in a data drame which asks about finding unique ID's.  I'm not sure the second post actually answers the OP's question which is, 


  "I want to create a table with the number of unique id for each
  combination of group1 and group2."


The keyword here is 'combination'. The interpretation is each id has a particular value for group1 and a particular value for group2 so that the set of data of interest is the particular set of values c(id, group1, group2).

Here is the data.frame the OP provided:

df1 <- data.frame(id=sample(letters, 10000, replace = T),
group1=sample(1:2, 10000, replace = T),
group2=sample(100:101, 10000, replace = T))


Using data.table inspired by this post -- https://stackoverflow.com/a/13017723/5220858:

>library(data.table)
>DT <- data.table(df1)
>DT[, .N, by = .(group1, group2)]

   group1 group2    N
1:      1    100 2493
2:      1    101 2455
3:      2    100 2559
4:      2    101 2493


N is the count for the id that has a particular group1 value and a particular group2 value. Expanding to include the id also returns a table of 104 unique id, group1, group2 combinations.

>DT[, .N, by = .(id, group1, group2)]

     id group1 group2   N
  1:  t      1    100 107
  2:  g      1    101  85
  3:  l      1    101  98
  4:  a      1    100  83
  5:  j      1    101  98
 ---                     
100:  p      1    101  96
101:  r      2    101  91
102:  y      1    101 104
103:  g      1    100  83
104:  r      2    100  77

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-24 08:25
              
            
            
                                                                       
You can use the uniqueN function from data.table:

library(data.table)
setDT(df)[, uniqueN(id), by = date]


or (as per the comment of @Richard Scriven):

aggregate(id ~ date, df, function(x) length(unique(x)))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-24 08:26
              
            
            
                                                                       
Or we could use n_distinct from library(dplyr)

library(dplyr) 
df %>%
   group_by(date) %>%
   summarise(id=n_distinct(id))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复