count NA's appearing in between non-missing values

心不动则不痛 提交于 2019-12-07 15:36:06

问题


I have a data frame with 5 time columns (1st year, second y, etc), and for some rows I have NA's in between non-missing values. Sample below:

df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2, 
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA, 
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA, 
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA, 
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA, 
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA, 
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA, 
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -20L))

I would like to count, using dplyr, the number of rows that have missing values in between non-missing values. Rows 13, 14,and 16 are examples of this.

How can I achieve this? I have a feeling this has to do with pasteing the entire row and looking to the left and to the right of the NA... but not clear how to proceed with this.


回答1:


sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", apply(df, 1, paste, collapse = "")))

[1] 3

In short:

  1. concatenate the rows

  2. check which rows return TRUE for having a match to the string: "number-NA-number" (in this case, row 13, 14 and 16)

  3. sum the rows which evaluate to TRUE

Based on @ Sotos comment, this is even better since it prevents us from using apply with margin = 1:

sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", do.call(paste0, df)))


来源:https://stackoverflow.com/questions/56490797/count-nas-appearing-in-between-non-missing-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!