问题
I have a tidy tibble
with a value column identified by 4 ID columns.
> MWA
# A tibble: 16 x 5
# Groups: Dir [2]
VP Con Dir Seg time_seg
<int> <int> <int> <int> <int>
1 10 2 1 1 1810
2 10 2 1 2 260
3 10 2 1 3 540
4 10 2 1 4 1470
5 10 2 1 5 460
6 10 2 1 6 690
7 10 2 1 7 760
8 10 2 1 8 NA
9 10 2 2 1 320
10 10 2 2 2 1110
11 10 2 2 3 450
12 10 2 2 4 600
13 10 2 2 5 1680
14 10 2 2 6 730
15 10 2 2 7 850
16 10 2 2 8 840
The dput
to reproduce is
> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L,
690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L,
840L)), row.names = c(NA, -16L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))
They stem from a larger data set, where they have been grouped by VP
, Con
and finally Dir
.
As you can see, in tibble row 10 there is a NA
.
I now want to exclude the whole Dir
group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr
.
Using the filter
with is.na
or complete.cases
only removes the row with the NA
, not the complete group (which is one "case" in this dataset).
回答1:
Using all()
will evaluate the entire group, so you can skip the mutate
step.
MWA %>%
group_by(Dir) %>%
filter(all(!is.na(time_seg)))
# A tibble: 8 x 5
# Groups: Dir [1]
VP Con Dir Seg time_seg
<int> <int> <int> <int> <int>
1 10 2 2 1 320
2 10 2 2 2 1110
3 10 2 2 3 450
4 10 2 2 4 600
5 10 2 2 5 1680
6 10 2 2 6 730
7 10 2 2 7 850
8 10 2 2 8 840
回答2:
You can first check whether there is any missing value in the specific column and then exclude the whole group.
library(dplyr)
MWA %>%
group_by(VP, Con, Dir) %>%
mutate(any_na = any(is.na(time_seg))) %>%
filter(!any_na)
# A tibble: 8 x 6
# Groups: VP, Con, Dir [1]
# VP Con Dir Seg time_seg any_na
# <int> <int> <int> <int> <int> <lgl>
# 1 10 2 2 1 320 FALSE
# 2 10 2 2 2 1110 FALSE
# 3 10 2 2 3 450 FALSE
# 4 10 2 2 4 600 FALSE
# 5 10 2 2 5 1680 FALSE
# 6 10 2 2 6 730 FALSE
# 7 10 2 2 7 850 FALSE
# 8 10 2 2 8 840 FALSE
回答3:
There is anyNA
in base R
library(dplyr)
MWA %>%
group_by(Dir) %>%
filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups: Dir [1]
# VP Con Dir Seg time_seg
# <int> <int> <int> <int> <int>
#1 10 2 2 1 320
#2 10 2 2 2 1110
#3 10 2 2 3 450
#4 10 2 2 4 600
#5 10 2 2 5 1680
#6 10 2 2 6 730
#7 10 2 2 7 850
#8 10 2 2 8 840
来源:https://stackoverflow.com/questions/52038100/exclude-groups-with-nas-in-tidy-dataset