Find missing month after grouping with dplyr

丶灬走出姿态 提交于 2019-11-29 15:22:22

This can be done via tidyr::complete:

library(dplyr)
library(tidyr)

dat %>% 
    group_by(ID_1, ID_2) %>%
    complete(month = 1:12)

Tail of dataset:

Source: local data frame [6 x 5]
Groups: ID_1, ID_2 [1]

   ID_1  ID_2 month   st1   st2
  <int> <int> <int> <dbl> <dbl>
1     1     2     7   0.5   0.2
2     1     2     8    NA    NA
3     1     2     9   1.1   1.7
4     1     2    10   2.6   0.8
5     1     2    11   1.8   1.3
6     1     2    12   2.1   2.2

Expand grid to make all combos of groups, then merge:

# make reference with all needed rows
ref <- data.frame(expand.grid(unique(df1$ID_1),
                              unique(df1$ID_2),
                              1:12))
colnames(ref) <- colnames(df1)[1:3]

# them merge with all TRUE
res <- merge(df1, ref, all = TRUE)

# to check output, show only month = 8
res[ res$month == 8, ]
#    ID_1 ID_2 month st1 st2
# 8     1    1     8 0.7 0.9
# 20    1    2     8  NA  NA

If you go with tidyr, there is the complete function for this, you can nest ID_1 and ID_2 if you want both of the two variables as your grouping variable:

library(tidyr)
df1 = df %>% complete(nesting(ID_1, ID_2), month)

tail(df1)    
# Source: local data frame [6 x 5]

#    ID_1  ID_2 month   st1   st2
#   <int> <int> <int> <dbl> <dbl>
# 1     1     2     7   0.5   0.2
# 2     1     2     8    NA    NA
# 3     1     2     9   1.1   1.7
# 4     1     2    10   2.6   0.8
# 5     1     2    11   1.8   1.3
# 6     1     2    12   2.1   2.2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!