dplyr n_distinct with condition

前端未结

关注

 4  1545

Using dplyr to summarise a dataset, I want to call n_distinct to count the number of unique occurrences in a column. However, I also want to do another summarise() for all uniqu

相关标签:

4条回答

有刺的猬

2021-02-05 18:07

An alternative is to use the uniqueN function from data.table inside dplyr:

library(dplyr)
library(data.table)
a %>% summarise(count_all = n_distinct(A), count_BisY = uniqueN(A[B == 'Y']))

which gives:

  count_all count_BisY
1         3          2

You can also do everything with data.table:

library(data.table)
setDT(a)[, .(count_all = uniqueN(A), count_BisY = uniqueN(A[B == 'Y']))]

which gives the same result.

0 讨论(0)

被撕碎了的回忆

2021-02-05 18:18

We can also use aggregate from base R

 aggregate(cbind(count=A)~B, a, FUN=function(x) length(unique(x)))
 #  B count
 #1 N 1
 #2 Y 2

Based on the OP's expected output

 data.frame(count=length(unique(a$A)), 
            count_BisY = length(unique(a$A[a$B=="Y"])))

0 讨论(0)

长发绾君心

2021-02-05 18:28
Filtering the dataframe before performing the summarise works
```
a %>%
  filter(B=="Y") %>%
  summarise(count = n_distinct(A))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人及你

2021-02-05 18:29

This produces the distinct A counts by each value of B using dplyr.

library(dplyr)
a %>%
  group_by(B) %>%
  summarise(count = n_distinct(A))

This produces the result:

Source: local data frame [2 x 2]

       B count
  (fctr) (int)
1      N     1
2      Y     2

To produce the desired output added above using dplyr, you can do the following:

a %>% summarise(count_all = n_distinct(A), count_BisY = length(unique(A[B == 'Y'])))

This produces the result:

  count_all count_BisY
1         3          2

0 讨论(0)