filter one dataframe via conditions in another

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-19 08:55:24

问题


I want to recursively filter a dataframe, d by an arbitrary number of conditions (represented as rows in another dataframe z).

I begin with a dataframe d:

d <- data.frame(x = 1:10, y = letters[1:10])

The second dataframe z, has columns x1 and x2, which are lower and upper limits to filter d$x. This dataframe z may grow to be an arbitrary number of rows long.

z <- data.frame(x1 = c(1,3,8), x2 = c(1,4,10))

I want to return all rows of d for which d$x <= z$x1[i] and d$x >= z$x2[i] for all i, where i = nrow(z).

So for this toy example, exclude everything from 1:1, 3:4, 8:10, inclusive.

   x  y
2  2  b 
5  5  e
6  6  f
7  7  g

回答1:


We can create a sequence between x1 and x2 values and use anti_join to select rows from d that are not present in z.

library(tidyverse)

remove <- z %>%
  mutate(x = map2(x1, x2, seq)) %>%
  unnest(x) %>%
  select(x)

anti_join(d, remove)

#  x y
#1 2 b
#2 5 e
#3 6 f
#4 7 g



回答2:


We can use a non-equi join

library(data.table)
i1 <- setDT(d)[z, .I, on = .(x >=x1, x <= x2), by = .EACHI]$I
i1
#[1]  1  3  4  8  9 10
d[i1]
#    x y
#1:  1 a
#2:  3 c
#3:  4 d
#4:  8 h
#5:  9 i
#6: 10 j
d[!i1]
#   x y
#1: 2 b
#2: 5 e
#3: 6 f
#4: 7 g

Or using fuzzyjoin

library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(d, z, by = c('x' = 'x1', 'x' = 'x2'),
        match_fun = list(`>=`, `<=`)) %>% 
     select(names(d))
# A tibble: 6 x 2
#      x y    
#  <int> <fct>
#1     1 a    
#2     3 c    
#3     4 d    
#4     8 h    
#5     9 i    
#6    10 j    

Or to get the rows not in 'x' from 'd'

fuzzy_anti_join(d, z, by = c('x' = 'x1', 'x' = 'x2'),
        match_fun = list(`>=`, `<=`)) %>% 
     select(names(d))
# A tibble: 4 x 2
#      x y    
#  <int> <fct>
#1     2 b    
#2     5 e    
#3     6 f    
#4     7 g    


来源:https://stackoverflow.com/questions/61050352/filter-one-dataframe-via-conditions-in-another

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!