问题
I have the following tibble:
library(tidyverse)
df <- tibble::tribble(
~gene, ~colB, ~colC,
"a", 1, 2,
"b", 2, 3,
"c", 3, 4,
"d", 1, 1
)
df
#> # A tibble: 4 x 3
#> gene colB colC
#> <chr> <dbl> <dbl>
#> 1 a 1 2
#> 2 b 2 3
#> 3 c 3 4
#> 4 d 1 1
What I want to do is to filter every columns after gene
column
for values greater or equal 2 (>=2). Resulting in this:
gene, colB, colC
a NA 2
b 2 3
c 3 4
How can I achieve that?
The number of columns after genes actually is more than just 2.
回答1:
One solution: convert from wide to long format, so you can filter on just one column, then convert back to wide at the end if required. Note that this will drop genes where no values meet the condition.
library(tidyverse)
df %>%
gather(name, value, -gene) %>%
filter(value >= 2) %>%
spread(name, value)
# A tibble: 3 x 3
gene colB colC
* <chr> <dbl> <dbl>
1 a NA 2
2 b 2 3
3 c 3 4
回答2:
The forthcoming dplyr 0.6 (install from GitHub now, if you like) has filter_at
, which can be used to filter to any rows that have a value greater than or equal to 2, and then na_if
can be applied similarly through mutate_at
, so
df %>%
filter_at(vars(-gene), any_vars(. >= 2)) %>%
mutate_at(vars(-gene), funs(na_if(., . < 2)))
#> # A tibble: 3 x 3
#> gene colB colC
#> <chr> <dbl> <dbl>
#> 1 a NA 2
#> 2 b 2 3
#> 3 c 3 4
or similarly,
df %>%
mutate_at(vars(-gene), funs(na_if(., . < 2))) %>%
filter_at(vars(-gene), any_vars(!is.na(.)))
which can be translated for use with dplyr 0.5:
df %>%
mutate_at(vars(-gene), funs(na_if(., . < 2))) %>%
filter(rowSums(is.na(.)) < (ncol(.) - 1))
All return the same thing.
回答3:
We can use data.table
library(data.table)
setDT(df)[df[, Reduce(`|`, lapply(.SD, `>=`, 2)), .SDcols = colB:colC]
][, (2:3) := lapply(.SD, function(x) replace(x, x < 2, NA)), .SDcols = colB:colC][]
# gene colB colC
#1: a NA 2
#2: b 2 3
#3: c 3 4
Or with melt/dcast
dcast(melt(setDT(df), id.var = 'gene')[value>=2], gene ~variable)
# gene colB colC
#1: a NA 2
#2: b 2 3
#3: c 3 4
来源:https://stackoverflow.com/questions/44233337/how-to-filter-rows-for-every-column-independently-using-dplyr