Exclude groups with NAs in tidy dataset

假装没事ソ 提交于 2019-12-11 05:59:43

问题


I have a tidy tibble with a value column identified by 4 ID columns.

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

The dput to reproduce is

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.

As you can see, in tibble row 10 there is a NA.

I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.

Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).


回答1:


Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840



回答2:


You can first check whether there is any missing value in the specific column and then exclude the whole group.

library(dplyr)

MWA %>% 
  group_by(VP, Con, Dir) %>% 
  mutate(any_na = any(is.na(time_seg))) %>% 
  filter(!any_na)

# A tibble: 8 x 6
# Groups:   VP, Con, Dir [1]
#     VP   Con   Dir   Seg time_seg any_na
#   <int> <int> <int> <int>    <int> <lgl> 
# 1    10     2     2     1      320 FALSE 
# 2    10     2     2     2     1110 FALSE 
# 3    10     2     2     3      450 FALSE 
# 4    10     2     2     4      600 FALSE 
# 5    10     2     2     5     1680 FALSE 
# 6    10     2     2     6      730 FALSE 
# 7    10     2     2     7      850 FALSE 
# 8    10     2     2     8      840 FALSE 



回答3:


There is anyNA in base R

library(dplyr)
MWA %>%
    group_by(Dir) %>%
    filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups:   Dir [1]
#     VP   Con   Dir   Seg time_seg
#  <int> <int> <int> <int>    <int>
#1    10     2     2     1      320
#2    10     2     2     2     1110
#3    10     2     2     3      450
#4    10     2     2     4      600
#5    10     2     2     5     1680
#6    10     2     2     6      730
#7    10     2     2     7      850
#8    10     2     2     8      840


来源:https://stackoverflow.com/questions/52038100/exclude-groups-with-nas-in-tidy-dataset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!