问题
I have a data frame with 5 time columns (1st year, second y, etc), and for some rows I have NA's in between non-missing values. Sample below:
df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2,
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA,
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA,
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA,
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA,
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA,
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA,
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -20L))
I would like to count, using dplyr, the number of rows that have missing values in between non-missing values. Rows 13, 14,and 16 are examples of this.
How can I achieve this? I have a feeling this has to do with paste
ing the entire row and looking to the left and to the right of the NA... but not clear how to proceed with this.
回答1:
sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", apply(df, 1, paste, collapse = "")))
[1] 3
In short:
concatenate the rows
check which rows return
TRUE
for having a match to the string: "number-NA-number" (in this case, row13
,14
and16
)sum the rows which evaluate to
TRUE
Based on @ Sotos comment, this is even better since it prevents us from using apply
with margin = 1
:
sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", do.call(paste0, df)))
来源:https://stackoverflow.com/questions/56490797/count-nas-appearing-in-between-non-missing-values