I have a data.frame
with 15,000 observations of 34 ordinal and NA
variables. I am performing clustering for a market segmentation study and need the ro
which(rowSums(is.na(Store2))==ncol(Store2))
#3 4
#3 4
Or
which(Reduce(`&`,as.data.frame(is.na(Store2))))
#[1] 3 4
Or
which(!rowSums(!is.na(Store2)))
#3 4
#3 4
Store2 <- structure(list(Age = c(NA, "45-54", NA, NA, "45-54"), Gender = c("Male",
"Female", NA, NA, "Female"), HouseholdIncome = c(NA, NA, NA,
NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"),
PresenceofChildren = c(NA, NA, NA, NA, "Yes"), HomeOwnerStatus = c(NA,
NA, NA, NA, "Own"), HomeMarketValue = c(NA, NA, NA, NA, "150k-200k"
)), .Names = c("Age", "Gender", "HouseholdIncome", "MaritalStatus",
"PresenceofChildren", "HomeOwnerStatus", "HomeMarketValue"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
To drop the rows with all NA
s
Store2[!!rowSums(!is.na(Store2)),]
# Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus
#1 Male
#2 45-54 Female
#5 45-54 Female 75k-100k Married Yes Own
#HomeMarketValue
#1
#2
#5 150k-200k
is.na(Store2)
gives a logical index of elements that are missing or NA
!
will negate the logical index i.e. TRUE
becomes FALSE
and viceversarowSums
of the above code gives the sum of elements that are not NA
in each row
rowSums(!is.na(Store2))
# 1 2 3 4 5
# 1 2 0 0 7 # 3rd and 4th row have `0 non NA` values
!
Negate the above gives
!rowSums(!is.na(Store2))
# 1 2 3 4 5
#FALSE FALSE TRUE TRUE FALSE
We wanted to drop those rows that are all NA's
or 0 non NAs
. So !
again
!!rowSums(!is.na(Store2))
#1 2 3 4 5
#TRUE TRUE FALSE FALSE TRUE
Subset using the above logical index
If you have two rowNo
, i.e. the one you stored separately before deleting the NA rows and the second after you deleted the NAs.
RowNo1 <- 1:6
RowNo2 <- c(1,2,5,6)
RowNo1 %in% RowNo2
#[1] TRUE TRUE FALSE FALSE TRUE TRUE
RowNo1[RowNo1 %in% RowNo2]
#[1] 1 2 5 6
With your new requests, let me try it again:
Store2 <- structure(list(RowNo = 1:5, Age = c(NA, "45-54", NA, NA, "45-54"
), Gender = c("Male", "Female", NA, NA, "Female"), HouseholdIncome = c(NA,
NA, NA, NA, "75k-100k"), MaritalStatus = c(NA, NA, NA, NA, "Married"
), PresenceofChildren = c(NA, NA, NA, NA, "Yes")), .Names = c("RowNo",
"Age", "Gender", "HouseholdIncome", "MaritalStatus", "PresenceofChildren"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"
))
Saving RowNo
as separate vector (I am not sure why you need this)
Store2new1 <- Store2$RowNo
Delete rows with all NA values in Store2
data.frame and store it as Store2df
Store2df <- Store2[!!rowSums(!is.na(Store2[,-1])),] #Here you already get the new dataset with `RowNo` column
Store2df
#RowNo Age Gender HouseholdIncome MaritalStatus PresenceofChildren
#1 1 Male
#2 2 45-54 Female
#5 5 45-54 Female 75k-100k Married Yes
Delete same rows in Store2new1 vector as Store2df data.frame
Store2new2 <- Store2new1[Store2new1 %in% Store2df$RowNo]
Store2new1[Store2new1 %in% Store2df$RowNo]
#[1] 1 2 5
I don't really think the fourth step or third is required unless you want to delete more rows, which is not clear from the post.