Identifying rows in data.frame with only NA values in R

前端未结

关注

 2  1466

孤独总比滥情好 2021-02-20 05:31

I have a data.frame with 15,000 observations of 34 ordinal and NA variables. I am performing clustering for a market segmentation study and need the ro

2条回答

一个人的身影 (楼主)

2021-02-20 05:41

 which(rowSums(is.na(Store2))==ncol(Store2))
 #3 4 
 #3 4

 which(Reduce(`&`,as.data.frame(is.na(Store2))))
 #[1] 3 4

 which(!rowSums(!is.na(Store2)))  
 #3 4 
 #3 4

data

 Store2 <- structure(list(Age = c(NA, "45-54", NA, NA, "45-54"), Gender = c("Male", 
 "Female", NA, NA, "Female"), HouseholdIncome = c(NA, NA, NA, 
  NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"), 
PresenceofChildren = c(NA, NA, NA, NA, "Yes"), HomeOwnerStatus = c(NA, 
NA, NA, NA, "Own"), HomeMarketValue = c(NA, NA, NA, NA, "150k-200k"
)), .Names = c("Age", "Gender", "HouseholdIncome", "MaritalStatus", 
"PresenceofChildren", "HomeOwnerStatus", "HomeMarketValue"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Update

To drop the rows with all NAs

  Store2[!!rowSums(!is.na(Store2)),]
  #   Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus
  #1     Male                                                 
  #2 45-54 Female                                                 
  #5 45-54 Female        75k-100k       Married                Yes             Own
   #HomeMarketValue
  #1            
  #2            
  #5       150k-200k

is.na(Store2) gives a logical index of elements that are missing or NA
! will negate the logical index i.e. TRUE becomes FALSE and viceversa

rowSums of the above code gives the sum of elements that are not NA in each row

    rowSums(!is.na(Store2))
    #   1 2 3 4 5 
    #   1 2 0 0 7  # 3rd and 4th row have `0 non NA` values

! Negate the above gives

    !rowSums(!is.na(Store2))
    # 1     2     3     4     5 
    #FALSE FALSE  TRUE  TRUE FALSE

We wanted to drop those rows that are all NA's or 0 non NAs. So ! again

    !!rowSums(!is.na(Store2))
    #1     2     3     4     5 
    #TRUE  TRUE FALSE FALSE  TRUE

Subset using the above logical index

Update2

If you have two rowNo, i.e. the one you stored separately before deleting the NA rows and the second after you deleted the NAs.

   RowNo1 <- 1:6
   RowNo2 <- c(1,2,5,6)
   RowNo1 %in% RowNo2
   #[1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE
   RowNo1[RowNo1 %in% RowNo2]
   #[1] 1 2 5 6

Update3

With your new requests, let me try it again:

    Store2 <- structure(list(RowNo = 1:5, Age = c(NA, "45-54", NA, NA, "45-54"
    ), Gender = c("Male", "Female", NA, NA, "Female"), HouseholdIncome = c(NA, 
    NA, NA, NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"
   ), PresenceofChildren = c(NA, NA, NA, NA, "Yes")), .Names = c("RowNo", 
   "Age", "Gender", "HouseholdIncome", "MaritalStatus", "PresenceofChildren"
   ), class = "data.frame", row.names = c("1", "2", "3", "4", "5"
   ))

First step

Saving RowNo as separate vector (I am not sure why you need this)

  Store2new1 <- Store2$RowNo

Second step

Delete rows with all NA values in Store2 data.frame and store it as Store2df

   Store2df <- Store2[!!rowSums(!is.na(Store2[,-1])),] #Here you already get the new dataset with `RowNo` column

   Store2df
   #RowNo   Age Gender HouseholdIncome MaritalStatus PresenceofChildren
   #1     1     Male                                     
   #2     2 45-54 Female                                     
   #5     5 45-54 Female        75k-100k       Married                Yes

Third step

Delete same rows in Store2new1 vector as Store2df data.frame

   Store2new2 <- Store2new1[Store2new1 %in% Store2df$RowNo]
   Store2new1[Store2new1 %in% Store2df$RowNo]
   #[1] 1 2 5

Fourth step

I don't really think the fourth step or third is required unless you want to delete more rows, which is not clear from the post.

0 讨论(0)

查看其它2个回答