Remove rows from dataframe that contains only 0 or just a single 0

青春壹個敷衍的年華 提交于 2019-12-06 00:42:06

问题


I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns.

Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary.

I have tried to paste some of my data here with the results I want to obtain.

unfiltered:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    6   MIR218-2    0           0           0           0
    7   MIR10B      787.516     330.556     0           20336.4
    8   MIR3176     0           0           0           0

any rows with containing a zero removed:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34

only rows that is all zero is filtered away:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    7   MIR10B      787.516     330.556     0           20336.4

I did find a way of removing any rows that had at least 1 zero in it, but it was "cheating" by exchanging all zeros with NA and then using complete.cases to filter.

Also, by doing that it remove all rows where the GeneName had a zero in it (as for MIR10B).

I can solve it by using for loops, but I have been told that loops in R is very ineffective and would therefore like to avoid that solution.

EDIT: While Xin Yin's solution works perfectly well and kept the data in a data frame, the answer by David Arenburg is supposedly more efficient and should be used.


回答1:


Using data.table (assuming df is your data set)

library(data.table)
setDT(df)[, .SD[!all(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total  PC3small  PC3total
# 1:  MIR22HG  1  33221.500   1224.550  2156.430   573.315
# 2: MIRLET7E  2  87566.100   7737.990 25039.300 16415.600
# 3:   MIR612  3      0.000      0.000   530.068     0.000
# 4: MIR218-1  4      0.000      0.000  1166.880   701.253
# 5: MIR181B2  5  70723.200   3958.010  6209.850  1399.340
# 6:   MIR10B  7    787.516    330.556     0.000 20336.400

Or if you only want to remove rows with any zeroes

setDT(df)[, .SD[!any(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total PC3small  PC3total
# 1:  MIR22HG  1    33221.5    1224.55  2156.43   573.315
# 2: MIRLET7E  2    87566.1    7737.99 25039.30 16415.600
# 3: MIR181B2  5    70723.2    3958.01  6209.85  1399.340



回答2:


Remove rows with any zero:

df[!rowSums(df[-c(1:2)] == 0) >= 1,]

Remove rows with all zeros:

df[!!rowSums(abs(df[-c(1:2)])),]

Inspired by this question




回答3:


Using rowSums over subset of columns, try this:

#dummy data
df <- read.table(text="
ID  GeneName    DU145small  DU145total  PC3small    PC3total
1   MIR22HG     33221.5     1224.55     2156.43     573.315
2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
3   MIR612      0           0           530.068     0
4   MIR218-1    0           0           1166.88     701.253
5   MIR181B2    70723.2     3958.01     6209.85     1399.34
6   MIR218-2    0           0           0           0
7   MIR10B      787.516     330.556     0           20336.4
8   MIR3176     0           0           0           0",
                 header=TRUE)
#remove any zero
df[ !rowSums(df[,colnames(df)[(3:ncol(df))]]==0)>=1, ]

#remove all zero
df[ !rowSums(df[,colnames(df)[(3:ncol(df))]]==0)==ncol(df)-2, ]



回答4:


This would work

> (unfiltered <- read.table(text="
+    ID  GeneName    DU145small  DU145total  PC3small    PC3total
+     1   MIR22HG     33221.5     1224.55     2156.43     573.315
+     2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
+     3   MIR612      0           0           530.068     0
+     4   MIR218-1    0           0           1166.88     701.253
+     5   MIR181B2    70723.2     3958.01     6209.85     1399.34
+     6   MIR218-2    0           0           0           0
+     7   MIR10B      787.516     330.556     0           20336.4
+     8   MIR3176     0           0           0           0
+ ", header=T))
  ID GeneName DU145small DU145total  PC3small  PC3total
1  1  MIR22HG  33221.500   1224.550  2156.430   573.315
2  2 MIRLET7E  87566.100   7737.990 25039.300 16415.600
3  3   MIR612      0.000      0.000   530.068     0.000
4  4 MIR218-1      0.000      0.000  1166.880   701.253
5  5 MIR181B2  70723.200   3958.010  6209.850  1399.340
6  6 MIR218-2      0.000      0.000     0.000     0.000
7  7   MIR10B    787.516    330.556     0.000 20336.400
8  8  MIR3176      0.000      0.000     0.000     0.000
> 
> (any.zero <- unfiltered[!apply(unfiltered[, -c(1,2)], 1, function(row) any(row == 0)), ])
  ID GeneName DU145small DU145total PC3small  PC3total
1  1  MIR22HG    33221.5    1224.55  2156.43   573.315
2  2 MIRLET7E    87566.1    7737.99 25039.30 16415.600
5  5 MIR181B2    70723.2    3958.01  6209.85  1399.340
> (all.zero <- unfiltered[!apply(unfiltered[, -c(1,2)], 1, function(row) all(row == 0)), ])
  ID GeneName DU145small DU145total  PC3small  PC3total
1  1  MIR22HG  33221.500   1224.550  2156.430   573.315
2  2 MIRLET7E  87566.100   7737.990 25039.300 16415.600
3  3   MIR612      0.000      0.000   530.068     0.000
4  4 MIR218-1      0.000      0.000  1166.880   701.253
5  5 MIR181B2  70723.200   3958.010  6209.850  1399.340
7  7   MIR10B    787.516    330.556     0.000 20336.400


来源:https://stackoverflow.com/questions/25203813/remove-rows-from-dataframe-that-contains-only-0-or-just-a-single-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!