I\'ve searched on SO trying to find a solution to no avail. So here it is. I have a data frame with many columns, some of which are numerical and should be non-negative. I w
Here is my ugly solution. Suggestions/criticisms welcome
df %>%
# Select the columns we want
select(matches("_num$")) %>%
# Convert every column to logical if >= 0
lapply(">=", 0) %>%
# Reduce all the sublist with AND
Reduce(f = "&", .) %>%
# Convert the one vector of logical into numeric
# index since slice can't deal with logical.
# Can simply write `{df[.,]}` here instead,
# which is probably faster than which + slice
# Edit: This is not true. which + slice is faster than `[` in this case
which %>%
slice(.data = df)
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
2 4 leroy 0 az 25 old
3 5 jerry 4 mi 55 old
This will give you a vector of your rows that are less than 0:
desired_rows <- sapply(target_columns, function(x) which(df[,x]<0), simplify=TRUE)
desired_rows <- as.vector(unique(unlist(desired_rows)))
Then to get a df of your desired rows:
setdiff(df, df[desired_rows,])
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
2 4 leroy 0 az 25 old
3 5 jerry 4 mi 55 old
I wanted to see this was possible using standard evaluation with dplyr's filter_
. It turns out it can be done with the help of interp
from lazyeval, following the example code on this page. Essentially, you have to create a list of the interp
conditions which you then pass to the .dots
argument of filter_
.
library(lazyeval)
dots <- lapply(target_columns, function(cols){
interp(~y >= 0, .values = list(y = as.name(cols)))
})
filter_(df, .dots = dots)
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
2 4 leroy 0 az 25 old
3 5 jerry 4 mi 55 old
Update
Starting with dplyr_0.7, this can be done directly with filter_at
and all_vars
(no lazyeval needed).
df %>%
filter_at(vars(target_columns), all_vars(. >= 0) )
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
2 4 leroy 0 az 25 old
3 5 jerry 4 mi 55 old
Using base R to get your result
cond <- df[, grepl("_num$", colnames(df))] >= 0
df[apply(cond, 1, function(x) {prod(x) == 1}), ]
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
4 4 leroy 0 az 25 old
5 5 jerry 4 mi 55 old
Edit: this assumes you have multiple columns with "_num". It won't work if you have just one _num column
First we create an index of all numeric columns. Then we subset all columns greater or equal than zero. So there is no need to check the column names, and the column id will be always positive.
nums <- sapply(df, is.numeric)
df[apply(df[, nums], MARGIN = 1, function(x) all(x >= 0)), ]
Output:
id sth1 tg1_num sth2 tg2_num others
1 1 dave 2 ca 35 new
4 4 leroy 0 az 25 old
5 5 jerry 4 mi 55 old
Here's a possible vectorized solution
ind <- grep("_num$", colnames(df))
df[!rowSums(df[ind] < 0),]
# id sth1 tg1_num sth2 tg2_num others
# 1 1 dave 2 ca 35 new
# 4 4 leroy 0 az 25 old
# 5 5 jerry 4 mi 55 old
The idea here is to create a logical matrix using the <
function (it is a generic function which has data.frame
method - which means it returns a data frame like structure back). Then, we are using rowSums
to find if there were any matched conditions (> 0 - matched, 0- not matched). Then, we are using the !
function in order to convert it to a logical vector: >0 becomes TRUE
, while 0 becomes FALSE
. Finally, we are subsetting according to that vector.