I have a data.frame
with 15,000 observations of 34 ordinal and NA
variables. I am performing clustering for a market segmentation study and need the ro
which(rowSums(is.na(Store2))==ncol(Store2))
#3 4
#3 4
Or
which(Reduce(`&`,as.data.frame(is.na(Store2))))
#[1] 3 4
Or
which(!rowSums(!is.na(Store2)))
#3 4
#3 4
Store2 <- structure(list(Age = c(NA, "45-54", NA, NA, "45-54"), Gender = c("Male",
"Female", NA, NA, "Female"), HouseholdIncome = c(NA, NA, NA,
NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"),
PresenceofChildren = c(NA, NA, NA, NA, "Yes"), HomeOwnerStatus = c(NA,
NA, NA, NA, "Own"), HomeMarketValue = c(NA, NA, NA, NA, "150k-200k"
)), .Names = c("Age", "Gender", "HouseholdIncome", "MaritalStatus",
"PresenceofChildren", "HomeOwnerStatus", "HomeMarketValue"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
To drop the rows with all NA
s
Store2[!!rowSums(!is.na(Store2)),]
# Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus
#1 <NA> Male <NA> <NA> <NA> <NA>
#2 45-54 Female <NA> <NA> <NA> <NA>
#5 45-54 Female 75k-100k Married Yes Own
#HomeMarketValue
#1 <NA>
#2 <NA>
#5 150k-200k
is.na(Store2)
gives a logical index of elements that are missing or NA
!
will negate the logical index i.e. TRUE
becomes FALSE
and viceversarowSums
of the above code gives the sum of elements that are not NA
in each row
rowSums(!is.na(Store2))
# 1 2 3 4 5
# 1 2 0 0 7 # 3rd and 4th row have `0 non NA` values
!
Negate the above gives
!rowSums(!is.na(Store2))
# 1 2 3 4 5
#FALSE FALSE TRUE TRUE FALSE
We wanted to drop those rows that are all NA's
or 0 non NAs
. So !
again
!!rowSums(!is.na(Store2))
#1 2 3 4 5
#TRUE TRUE FALSE FALSE TRUE
Subset using the above logical index
If you have two rowNo
, i.e. the one you stored separately before deleting the NA rows and the second after you deleted the NAs.
RowNo1 <- 1:6
RowNo2 <- c(1,2,5,6)
RowNo1 %in% RowNo2
#[1] TRUE TRUE FALSE FALSE TRUE TRUE
RowNo1[RowNo1 %in% RowNo2]
#[1] 1 2 5 6
With your new requests, let me try it again:
Store2 <- structure(list(RowNo = 1:5, Age = c(NA, "45-54", NA, NA, "45-54"
), Gender = c("Male", "Female", NA, NA, "Female"), HouseholdIncome = c(NA,
NA, NA, NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"
), PresenceofChildren = c(NA, NA, NA, NA, "Yes")), .Names = c("RowNo",
"Age", "Gender", "HouseholdIncome", "MaritalStatus", "PresenceofChildren"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"
))
Saving RowNo
as separate vector (I am not sure why you need this)
Store2new1 <- Store2$RowNo
Delete rows with all NA values in Store2
data.frame and store it as Store2df
Store2df <- Store2[!!rowSums(!is.na(Store2[,-1])),] #Here you already get the new dataset with `RowNo` column
Store2df
#RowNo Age Gender HouseholdIncome MaritalStatus PresenceofChildren
#1 1 <NA> Male <NA> <NA> <NA>
#2 2 45-54 Female <NA> <NA> <NA>
#5 5 45-54 Female 75k-100k Married Yes
Delete same rows in Store2new1 vector as Store2df data.frame
Store2new2 <- Store2new1[Store2new1 %in% Store2df$RowNo]
Store2new1[Store2new1 %in% Store2df$RowNo]
#[1] 1 2 5
I don't really think the fourth step or third is required unless you want to delete more rows, which is not clear from the post.
Using the Store2
sample data posted in the answer provided by @akrun
which(apply(Store2, 1, function(x) all(is.na(x))))
#3 4
#3 4
Or, similar to akrun's answer:
which(rowSums(!is.na(Store2))==0)
#3 4
#3 4