I have a problem to solve how to remove rows with a Zero value in R. In others hand, I can use na.omit()
to delete all the NA values or use complete.cases
I would do the following.
Set the zero to NA.
data[data==0] <- NA
data
Delete the rows associated with NA.
data2<-data[complete.cases(data),]
Using tidyverse/dplyr, you can also remove rows with any zero value in a subset of variables:
# variables starting with Mac must be non-zero
filter_at(df, vars(starts_with("Mac")), all_vars((.) != 0))
# variables x, y, and z must be non-zero
filter_at(df, vars(x, y, z), all_vars((.) != 0))
# all numeric variables must be non-zero
filter_if(df, is.numeric, all_vars((.) != 0))
I would probably go with Joran's suggestion of replacing 0's with NAs and then using the built in functions you mentioned. If you can't/don't want to do that, one approach is to use any()
to find rows that contain 0's and subset those out:
set.seed(42)
#Fake data
x <- data.frame(a = sample(0:2, 5, TRUE), b = sample(0:2, 5, TRUE))
> x
a b
1 2 1
2 2 2
3 0 0
4 2 1
5 1 2
#Subset out any rows with a 0 in them
#Note the negation with ! around the apply function
x[!(apply(x, 1, function(y) any(y == 0))),]
a b
1 2 1
2 2 2
4 2 1
5 1 2
To implement Joran's method, something like this should get you started:
x[x==0] <- NA
In base R, we can select the columns which we want to test using grep
, compare the data with 0, use rowSums
to select rows which has all non-zero values.
cols <- grep("^Mac", names(df))
df[rowSums(df[cols] != 0) == length(cols), ]
# DateTime Mac1 Mac2 Mac3 Mac4
#1 2011-04-02 06:05 21 21 21 21
#2 2011-04-02 06:10 22 22 22 22
#3 2011-04-02 06:20 24 24 24 24
Doing this with inverted logic but giving the same output
df[rowSums(df[cols] == 0) == 0, ]
In dplyr
, we can use filter_at
to test for specific columns and use all_vars
to select rows where all the values are not equal to 0.
library(dplyr)
df %>% filter_at(vars(starts_with("Mac")), all_vars(. != 0))
data
df <- structure(list(DateTime = structure(1:6, .Label = c("2011-04-02 06:00",
"2011-04-02 06:05", "2011-04-02 06:10", "2011-04-02 06:15", "2011-04-02 06:20",
"2011-04-02 06:25"), class = "factor"), Mac1 = c(20L, 21L, 22L,
23L, 24L, 0L), Mac2 = c(0L, 21L, 22L, 23L, 24L, 25L), Mac3 = c(20L,
21L, 22L, 0L, 24L, 25L), Mac4 = c(20L, 21L, 22L, 23L, 24L, 0L
)), class = "data.frame", row.names = c(NA, -6L))
Well, you could swap your 0
's for NA
and then use one of those solutions, but for sake of a difference, you could notice that a number will only have a finite logarithm if it is greater than 0
, so that rowSums
of the log
will only be finite if there are no zeros in a row.
dfr[is.finite(rowSums(log(dfr[-1]))),]
There are a few different ways of doing this. I prefer using apply
, since it's easily extendable:
##Generate some data
dd = data.frame(a = 1:4, b= 1:0, c=0:3)
##Go through each row and determine if a value is zero
row_sub = apply(dd, 1, function(row) all(row !=0 ))
##Subset as usual
dd[row_sub,]