dplyr n_distinct with condition

前端 未结 4 1512
离开以前
离开以前 2021-02-05 17:39

Using dplyr to summarise a dataset, I want to call n_distinct to count the number of unique occurrences in a column. However, I also want to do another summarise() for all uniqu

相关标签:
4条回答
  • 2021-02-05 18:07

    An alternative is to use the uniqueN function from data.table inside dplyr:

    library(dplyr)
    library(data.table)
    a %>% summarise(count_all = n_distinct(A), count_BisY = uniqueN(A[B == 'Y']))
    

    which gives:

      count_all count_BisY
    1         3          2
    

    You can also do everything with data.table:

    library(data.table)
    setDT(a)[, .(count_all = uniqueN(A), count_BisY = uniqueN(A[B == 'Y']))]
    

    which gives the same result.

    0 讨论(0)
  • 2021-02-05 18:18

    We can also use aggregate from base R

     aggregate(cbind(count=A)~B, a, FUN=function(x) length(unique(x)))
     #  B count
     #1 N 1
     #2 Y 2
    

    Based on the OP's expected output

     data.frame(count=length(unique(a$A)), 
                count_BisY = length(unique(a$A[a$B=="Y"])))
    
    0 讨论(0)
  • 2021-02-05 18:28

    Filtering the dataframe before performing the summarise works

    a %>%
      filter(B=="Y") %>%
      summarise(count = n_distinct(A))
    
    0 讨论(0)
  • 2021-02-05 18:29

    This produces the distinct A counts by each value of B using dplyr.

    library(dplyr)
    a %>%
      group_by(B) %>%
      summarise(count = n_distinct(A))
    

    This produces the result:

    Source: local data frame [2 x 2]
    
           B count
      (fctr) (int)
    1      N     1
    2      Y     2
    

    To produce the desired output added above using dplyr, you can do the following:

    a %>% summarise(count_all = n_distinct(A), count_BisY = length(unique(A[B == 'Y'])))
    

    This produces the result:

      count_all count_BisY
    1         3          2
    
    0 讨论(0)
提交回复
热议问题