Identifying rows in data.frame with only NA values in R

前端 未结 2 1439
孤独总比滥情好
孤独总比滥情好 2021-02-20 05:31

I have a data.frame with 15,000 observations of 34 ordinal and NA variables. I am performing clustering for a market segmentation study and need the ro

2条回答
  •  一个人的身影
    2021-02-20 05:41

     which(rowSums(is.na(Store2))==ncol(Store2))
     #3 4 
     #3 4 
    

    Or

     which(Reduce(`&`,as.data.frame(is.na(Store2))))
     #[1] 3 4
    

    Or

     which(!rowSums(!is.na(Store2)))  
     #3 4 
     #3 4 
    

    data

     Store2 <- structure(list(Age = c(NA, "45-54", NA, NA, "45-54"), Gender = c("Male", 
     "Female", NA, NA, "Female"), HouseholdIncome = c(NA, NA, NA, 
      NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"), 
    PresenceofChildren = c(NA, NA, NA, NA, "Yes"), HomeOwnerStatus = c(NA, 
    NA, NA, NA, "Own"), HomeMarketValue = c(NA, NA, NA, NA, "150k-200k"
    )), .Names = c("Age", "Gender", "HouseholdIncome", "MaritalStatus", 
    "PresenceofChildren", "HomeOwnerStatus", "HomeMarketValue"), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5"))
    

    Update

    To drop the rows with all NAs

      Store2[!!rowSums(!is.na(Store2)),]
      #   Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus
      #1     Male                                                 
      #2 45-54 Female                                                 
      #5 45-54 Female        75k-100k       Married                Yes             Own
       #HomeMarketValue
      #1            
      #2            
      #5       150k-200k
    
    • is.na(Store2) gives a logical index of elements that are missing or NA
    • ! will negate the logical index i.e. TRUE becomes FALSE and viceversa
    • rowSums of the above code gives the sum of elements that are not NA in each row

          rowSums(!is.na(Store2))
          #   1 2 3 4 5 
          #   1 2 0 0 7  # 3rd and 4th row have `0 non NA` values
      
    • ! Negate the above gives

          !rowSums(!is.na(Store2))
          # 1     2     3     4     5 
          #FALSE FALSE  TRUE  TRUE FALSE 
      
    • We wanted to drop those rows that are all NA's or 0 non NAs. So ! again

          !!rowSums(!is.na(Store2))
          #1     2     3     4     5 
          #TRUE  TRUE FALSE FALSE  TRUE 
      
    • Subset using the above logical index

    Update2

    If you have two rowNo, i.e. the one you stored separately before deleting the NA rows and the second after you deleted the NAs.

       RowNo1 <- 1:6
       RowNo2 <- c(1,2,5,6)
       RowNo1 %in% RowNo2
       #[1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE
       RowNo1[RowNo1 %in% RowNo2]
       #[1] 1 2 5 6
    

    Update3

    With your new requests, let me try it again:

        Store2 <- structure(list(RowNo = 1:5, Age = c(NA, "45-54", NA, NA, "45-54"
        ), Gender = c("Male", "Female", NA, NA, "Female"), HouseholdIncome = c(NA, 
        NA, NA, NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"
       ), PresenceofChildren = c(NA, NA, NA, NA, "Yes")), .Names = c("RowNo", 
       "Age", "Gender", "HouseholdIncome", "MaritalStatus", "PresenceofChildren"
       ), class = "data.frame", row.names = c("1", "2", "3", "4", "5"
       ))
    

    First step

    Saving RowNo as separate vector (I am not sure why you need this)

      Store2new1 <- Store2$RowNo
    

    Second step

    Delete rows with all NA values in Store2 data.frame and store it as Store2df

       Store2df <- Store2[!!rowSums(!is.na(Store2[,-1])),] #Here you already get the new dataset with `RowNo` column
    
       Store2df
       #RowNo   Age Gender HouseholdIncome MaritalStatus PresenceofChildren
       #1     1     Male                                     
       #2     2 45-54 Female                                     
       #5     5 45-54 Female        75k-100k       Married                Yes
    

    Third step

    Delete same rows in Store2new1 vector as Store2df data.frame

       Store2new2 <- Store2new1[Store2new1 %in% Store2df$RowNo]
       Store2new1[Store2new1 %in% Store2df$RowNo]
       #[1] 1 2 5
    

    Fourth step

    I don't really think the fourth step or third is required unless you want to delete more rows, which is not clear from the post.

提交回复
热议问题