dplyr: Difference between unique and distinct

后端 未结 1 1936
名媛妹妹
名媛妹妹 2021-02-06 18:14

Seems the number of resulting rows is different when using distinct vs unique. The data set I am working with is huge. Hope the code is OK to understand.

dt2a &l         


        
相关标签:
1条回答
  • 2021-02-06 19:10

    This appears to be a result of the group_by Consider this case

    dt<-data.frame(g=rep(c("a","b"), each=3),
        v=c(2,2,5,2,7,7))
    
    dt %>% group_by(g) %>% unique()
    # Source: local data frame [4 x 2]
    # Groups: g
    # 
    #   g v
    # 1 a 2
    # 2 a 5
    # 3 b 2
    # 4 b 7
    
    dt %>% group_by(g) %>% distinct()
    # Source: local data frame [2 x 2]
    # Groups: g
    # 
    #   g v
    # 1 a 2
    # 2 b 2
    
    dt %>% group_by(g) %>% distinct(v)
    # Source: local data frame [4 x 2]
    # Groups: g
    # 
    #   g v
    # 1 a 2
    # 2 a 5
    # 3 b 2
    # 4 b 7
    

    When you use distinct() without indicating which variables to make distinct, it appears to use the grouping variable.

    0 讨论(0)
提交回复
热议问题