I have one problem would you like to give me a hand. I tried to come up with solution, but I do not have any idea how to work it out.
Please use this to recreate my
Try
df1[, colSums(!is.na(df1)) >= 7]
# A1 A3
#1 87 NA
#2 67 38
#3 80 10
#4 36 41
#5 71 NA
#6 6 66
#7 26 NA
#8 15 7
#9 14 29
#10 46 NA
#11 19 70
#12 93 23
#13 5 46
#14 94 55
step by step
What you need to do first is to find out which values of your data are not missing.
!is.na(df1)
This returns a logical matrix
# A1 A2 A3
# [1,] TRUE TRUE FALSE
# [2,] TRUE FALSE TRUE
# [3,] TRUE TRUE TRUE
# [4,] TRUE TRUE TRUE
# [5,] TRUE TRUE FALSE
# [6,] TRUE TRUE TRUE
# [7,] TRUE TRUE FALSE
# [8,] TRUE FALSE TRUE
# [9,] TRUE FALSE TRUE
#[10,] TRUE FALSE FALSE
#[11,] TRUE FALSE TRUE
#[12,] TRUE FALSE TRUE
#[13,] TRUE FALSE TRUE
#[14,] TRUE FALSE TRUE
Use colSums
to find out how many observations per column are not missing
colSums(!is.na(df1))
#A1 A2 A3
#14 6 10
Apply you condition "greater or equal of 7 observations(count) per columns"
colSums(!is.na(df1)) >= 7
# A1 A2 A3
# TRUE FALSE TRUE
Finally, you need to use this vector to subset your data
df1[, colSums(!is.na(df1)) >= 7]
Turn this into a function if you need it regulary
almost_complete_cols <- function(data, min_obs) {
data[, colSums(!is.na(data)) >= min_obs, drop = FALSE]
}
almost_complete_cols(df1, 7)