I have a lot of rows and columns in a very large matrix (184 x 4000, type double), and I want to remove all 0\'s. The values in the matrix are usually greater than 0 but the
Try this for removing the rows that contain only zeros.
x[!apply(x == 0, 1, all), , drop = FALSE]
I finally have the answer. The reason why
x<- x[which(rowSums(x) > 0),]
only returned 3 rows out of 184 was because this function only gives you those rows that do not sum up to 0 and/or do not have an NA in them. And I had a few NA's in all but 3 rows, I just wasn't aware of. Simply taking out the NA's did not work, because that didn't solve the rowSums problem. I needed the function to treat my NA's as zeros, so that the rows that did entail NA's (as in all but 3) would also be summed up and not just taken out of the matrix. So I turned all NA's into zeros by using
x[is.na(x)] <- 0
and THEN applying the function to sum up all rows and remove the ones that add up to 0. And it worked! Thanks to everyone for your input. Especially @arkun!
This worked for me, slightly change of @Richard Scriven:
remove_zeros<- function(x)
{
x = x[!apply(x == 0, 1, all),]
return(x)
}
You can drop rows which only contain 0s like this (and you could replace 0 with any other number if you wanted to drop rows with only that number):
x <- x[rowSums(x == 0) != ncol(x),]
Explanation:
x == 0
creates a matrix of logical values (TRUE/FALSE) and
rowSums(x == 0)
sums them up (TRUE == 1, FALSE == 0). ncol(x)
). You could try:
x[!rowSums(!x)==ncol(x),] #could be shortened to
x[!!rowSums(abs(x)),] #Inspired from @Richard Scriven
x <- structure(list(V1 = c(2, 0, 2, 2, 2, 3, 2, 0, 0, 3), V2 = c(2,
0, 0, 2, 3, 1, 0, 0, 0, 0), V3 = c(3, 0, 1, 3, 3, 2, 0, 3, 0,
1), V4 = c(3, 0, 2, 3, 2, 2, 2, 1, 2, 1), V5 = c(0, 0, 0, 0,
1, 2, 2, 2, 1, 3)), .Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA,
-10L), class = "data.frame")
!x
. Creates a logical index of TRUE and FALSE, where TRUE will be elements that are 0'srowSums(!x)
. rowwise Sum of those TRUEs,==ncol(x)
. Check whether the sum is equal to the number of columns. In the above example it is 5. That means all entries are 0!
Negate again because we want to filter out these rowsx
using this logical indexSuppose you have NA's in your dataset and you want to remove rows with all 0's or those with 0's and NA's, for e.g.
x <- structure(list(V1 = c(2, 0, 2, 2, 2, 3, 2, 0, 0, 3), V2 = c(2,
NA, 0, 2, 3, 1, 0, 0, 0, 0), V3 = c(3, 0, 1, 3, 3, 2, 0, 3, 0,
1), V4 = c(3, 0, 2, 3, 2, 2, NA, 1, 2, 1), V5 = c(0, 0, 0, 0,
1, 2, 2, 2, 1, 3)), .Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA,
-10L), class = "data.frame")
x[!(rowSums(!is.na(x) & !x)+rowSums(is.na(x)))==ncol(x),]
The idea is to first sum the NAs rowwise
Rowwise sum of all the elements that are not NAs and are 0's rowSUms(!is.na(x) & !x)
Take the sum of the above two. If that number matches with the number of columns, delete that row