Exclude groups with NAs in tidy dataset

问题

I have a tidy tibble with a value column identified by 4 ID columns.

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

The dput to reproduce is

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.

As you can see, in tibble row 10 there is a NA.

I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.

Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).

回答1:

Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840

回答2:

You can first check whether there is any missing value in the specific column and then exclude the whole group.

library(dplyr)

MWA %>% 
  group_by(VP, Con, Dir) %>% 
  mutate(any_na = any(is.na(time_seg))) %>% 
  filter(!any_na)

# A tibble: 8 x 6
# Groups:   VP, Con, Dir [1]
#     VP   Con   Dir   Seg time_seg any_na
#   <int> <int> <int> <int>    <int> <lgl> 
# 1    10     2     2     1      320 FALSE 
# 2    10     2     2     2     1110 FALSE 
# 3    10     2     2     3      450 FALSE 
# 4    10     2     2     4      600 FALSE 
# 5    10     2     2     5     1680 FALSE 
# 6    10     2     2     6      730 FALSE 
# 7    10     2     2     7      850 FALSE 
# 8    10     2     2     8      840 FALSE

回答3:

There is anyNA in base R

library(dplyr)
MWA %>%
    group_by(Dir) %>%
    filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups:   Dir [1]
#     VP   Con   Dir   Seg time_seg
#  <int> <int> <int> <int>    <int>
#1    10     2     2     1      320
#2    10     2     2     2     1110
#3    10     2     2     3      450
#4    10     2     2     4      600
#5    10     2     2     5     1680
#6    10     2     2     6      730
#7    10     2     2     7      850
#8    10     2     2     8      840

来源：https://stackoverflow.com/questions/52038100/exclude-groups-with-nas-in-tidy-dataset

标签

dplyr

tibble