filtering with multiple conditions on many columns using dplyr

前端未结

关注

 7  1445

I\'ve searched on SO trying to find a solution to no avail. So here it is. I have a data frame with many columns, some of which are numerical and should be non-negative. I w

相关标签:

7条回答

后悔当初

2021-01-02 02:29

Here is my ugly solution. Suggestions/criticisms welcome

df %>% 
  # Select the columns we want
  select(matches("_num$")) %>%
  # Convert every column to logical if >= 0
  lapply(">=", 0) %>%
  # Reduce all the sublist with AND 
  Reduce(f = "&", .) %>%
  # Convert the one vector of logical into numeric
  # index since slice can't deal with logical. 
  # Can simply write `{df[.,]}` here instead,
  # which is probably faster than which + slice
  # Edit: This is not true. which + slice is faster than `[` in this case
  which %>%
  slice(.data = df)

  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
2  4 leroy       0   az      25    old
3  5 jerry       4   mi      55    old

0 讨论(0)

爱一瞬间的悲伤

2021-01-02 02:31

This will give you a vector of your rows that are less than 0:

desired_rows <- sapply(target_columns, function(x) which(df[,x]<0), simplify=TRUE)
desired_rows <- as.vector(unique(unlist(desired_rows)))

Then to get a df of your desired rows:

setdiff(df, df[desired_rows,])
  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
2  4 leroy       0   az      25    old
3  5 jerry       4   mi      55    old

0 讨论(0)

心在旅途

2021-01-02 02:43

I wanted to see this was possible using standard evaluation with dplyr's filter_. It turns out it can be done with the help of interp from lazyeval, following the example code on this page. Essentially, you have to create a list of the interp conditions which you then pass to the .dots argument of filter_.

library(lazyeval)

dots <- lapply(target_columns, function(cols){
    interp(~y >= 0, .values = list(y = as.name(cols)))
})

filter_(df, .dots = dots)   

  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
2  4 leroy       0   az      25    old
3  5 jerry       4   mi      55    old

Update

Starting with dplyr_0.7, this can be done directly with filter_at and all_vars (no lazyeval needed).

df %>%
     filter_at(vars(target_columns), all_vars(. >= 0) )

  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
2  4 leroy       0   az      25    old
3  5 jerry       4   mi      55    old

0 讨论(0)

囚心锁ツ

2021-01-02 02:43

Using base R to get your result

cond <- df[, grepl("_num$", colnames(df))] >= 0
df[apply(cond, 1, function(x) {prod(x) == 1}), ]

  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
4  4 leroy       0   az      25    old
5  5 jerry       4   mi      55    old

Edit: this assumes you have multiple columns with "_num". It won't work if you have just one _num column

0 讨论(0)

挽巷

2021-01-02 02:44
First we create an index of all numeric columns. Then we subset all columns greater or equal than zero. So there is no need to check the column names, and the column id will be always positive.
```
nums <- sapply(df, is.numeric)
df[apply(df[, nums], MARGIN = 1, function(x) all(x >= 0)), ]
```
Output:
```
  id  sth1 tg1_num sth2 tg2_num others
1  1  dave       2   ca      35    new
4  4 leroy       0   az      25    old
5  5 jerry       4   mi      55    old
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-01-02 02:45
Here's a possible vectorized solution
```
ind <- grep("_num$", colnames(df))
df[!rowSums(df[ind] < 0),]
#   id  sth1 tg1_num sth2 tg2_num others
# 1  1  dave       2   ca      35    new
# 4  4 leroy       0   az      25    old
# 5  5 jerry       4   mi      55    old
```
The idea here is to create a logical matrix using the < function (it is a generic function which has data.frame method - which means it returns a data frame like structure back). Then, we are using rowSums to find if there were any matched conditions (> 0 - matched, 0- not matched). Then, we are using the ! function in order to convert it to a logical vector: >0 becomes TRUE, while 0 becomes FALSE. Finally, we are subsetting according to that vector.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页