问题
I have a lot of rows and columns in a very large matrix (184 x 4000, type double), and I want to remove all 0's. The values in the matrix are usually greater than 0 but there are some rows of 0.0000 . I tried to remove the rows with zeros using this:
x <- x[which(rowSums(x) > 0),]
but what I am left with is a mere 3 rows out of 184. And I know for a fact that the deleted 181 rows were not all 0 rows. Does anyone have a clue why this is happening and how I can fix it? I used this same code on a different matrix with the same structure (184 rows, 4000 columns) and it worked. What am I missing?
回答1:
You can drop rows which only contain 0s like this (and you could replace 0 with any other number if you wanted to drop rows with only that number):
x <- x[rowSums(x == 0) != ncol(x),]
Explanation:
x == 0
creates a matrix of logical values (TRUE/FALSE) androwSums(x == 0)
sums them up (TRUE == 1, FALSE == 0).- Then you check if the sum of each row is not equal to the number of columns
of your matrix (which are counted by
ncol(x)
). - If that is the case (which means not all entries are 0s), the row will be kept because it evaluates to TRUE. All other rows evaluate to FALSE and will be dropped.
回答2:
Try this for removing the rows that contain only zeros.
x[!apply(x == 0, 1, all), , drop = FALSE]
回答3:
You could try:
x[!rowSums(!x)==ncol(x),] #could be shortened to
x[!!rowSums(abs(x)),] #Inspired from @Richard Scriven
data
x <- structure(list(V1 = c(2, 0, 2, 2, 2, 3, 2, 0, 0, 3), V2 = c(2,
0, 0, 2, 3, 1, 0, 0, 0, 0), V3 = c(3, 0, 1, 3, 3, 2, 0, 3, 0,
1), V4 = c(3, 0, 2, 3, 2, 2, 2, 1, 2, 1), V5 = c(0, 0, 0, 0,
1, 2, 2, 2, 1, 3)), .Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA,
-10L), class = "data.frame")
!x
. Creates a logical index of TRUE and FALSE, where TRUE will be elements that are 0'srowSums(!x)
. rowwise Sum of those TRUEs,==ncol(x)
. Check whether the sum is equal to the number of columns. In the above example it is 5. That means all entries are 0!
Negate again because we want to filter out these rows- Subset
x
using this logical index
Update
Suppose you have NA's in your dataset and you want to remove rows with all 0's or those with 0's and NA's, for e.g.
x <- structure(list(V1 = c(2, 0, 2, 2, 2, 3, 2, 0, 0, 3), V2 = c(2,
NA, 0, 2, 3, 1, 0, 0, 0, 0), V3 = c(3, 0, 1, 3, 3, 2, 0, 3, 0,
1), V4 = c(3, 0, 2, 3, 2, 2, NA, 1, 2, 1), V5 = c(0, 0, 0, 0,
1, 2, 2, 2, 1, 3)), .Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA,
-10L), class = "data.frame")
x[!(rowSums(!is.na(x) & !x)+rowSums(is.na(x)))==ncol(x),]
- The idea is to first sum the NAs rowwise
Rowwise sum of all the elements that are not NAs and are 0's
rowSUms(!is.na(x) & !x)
Take the sum of the above two. If that number matches with the number of columns, delete that row
回答4:
I finally have the answer. The reason why
x<- x[which(rowSums(x) > 0),]
only returned 3 rows out of 184 was because this function only gives you those rows that do not sum up to 0 and/or do not have an NA in them. And I had a few NA's in all but 3 rows, I just wasn't aware of. Simply taking out the NA's did not work, because that didn't solve the rowSums problem. I needed the function to treat my NA's as zeros, so that the rows that did entail NA's (as in all but 3) would also be summed up and not just taken out of the matrix. So I turned all NA's into zeros by using
x[is.na(x)] <- 0
and THEN applying the function to sum up all rows and remove the ones that add up to 0. And it worked! Thanks to everyone for your input. Especially @arkun!
回答5:
This worked for me, slightly change of @Richard Scriven:
remove_zeros<- function(x)
{
x = x[!apply(x == 0, 1, all),]
return(x)
}
来源:https://stackoverflow.com/questions/25183844/r-i-want-to-go-through-rows-of-a-big-matrix-and-remove-all-zeros