问题
When I first started programming in R I would often use dplyr count().
library(tidyverse)
mtcars %>% count(cyl)
Once I started using apply
functions I started running into issues with count(). If I simply added ungroup() to the end of my count()'s the problems would go away.
I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count(), or after any group_by()? Of course I'm assuming I no longer need the data grouped after it's counted or summarized.
mtcars %>% count(cyl) %>% ungroup()
回答1:
The issues you used to run into were from an old behavior of count()
.
Up to dplyr 0.5.0, if you did:
mtcars %>%
count(cyl, wt)
The result would still be grouped by the cyl
column. This means, for example, that if you followed it with something like summarize(mean(am))
, you would have gotten one row for each cyl
when you may have expected one row overall. The issue would be fixed if you put %>% ungroup()
after the count.
This behavior was changed in dplyr 0.7.0 (released in June 2017), such that count()
preserves the grouping of its input (meaning mtcars %>% count(wt, cyl)
now returns an ungrouped table). This is likely why you're no longer able to reproduce the problems, and it means you no longer need to do ungroup()
after a count()
.
Note that you may still need to do ungroup()
after a group_by()
and summarize()
:
mtcars %>%
group_by(cyl, wt) %>%
summarize(n = n())
returns a tibble still grouped by cyl
:
# A tibble: 30 x 3
# Groups: cyl [?]
cyl wt n
<dbl> <dbl> <int>
1 4 1.51 1
2 4 1.62 1
3 4 1.84 1
4 4 1.94 1
5 4 2.14 1
6 4 2.2 1
7 4 2.32 1
8 4 2.46 1
9 4 2.78 1
10 4 3.15 1
# ... with 20 more rows
来源:https://stackoverflow.com/questions/51404252/in-r-dplyr-why-do-i-need-to-ungroup-after-i-count