“Set Difference” between two vectors with duplicate values

前端未结

关注

 3  1605

I have 3 vectors

x <- c(1,3,5,7,3,8)
y <- c(3,5,7)
z <- c(3,3,8)

I want to find the elements of x that are not in

相关标签:

3条回答

無奈伤痛

2020-12-06 21:10

Here's an attempt using make.unique to account for duplicates:

dupdiff <- function(x,y) x[-match(
  make.unique(as.character(y)),
  make.unique(as.character(x)),
  nomatch=0
)]

Testing:

dupdiff(x,y)
#[1] 1 3 8
dupdiff(x,z)
#[1] 1 5 7
dupdiff(x, c(5, 7))
#[1] 1 3 3 8
dupdiff(x, c(5, 7, 9))
#[1] 1 3 3 8

0 讨论(0)

天命终不由人

2020-12-06 21:12

There should be some better ways to do this but here is one option

get_diff_vectors <- function(x, y) {
  count_x <- table(x)
  count_y <- table(y)
  same_counts <- match(names(count_y), names(count_x))
  count_x[same_counts] <- count_x[same_counts] - count_y
  as.numeric(rep(names(count_x), count_x))
}

get_diff_vectors(x, y)
#[1] 1 3 8
get_diff_vectors(x, z)
#[1] 1 5 7
get_diff_vectors(x, c(5, 7))
#[1] 1 3 3 8

We count the frequency of x and y using table, match the numbers which occur in both and subtract the counts y from x. Finally recreate the remaining vector using rep.

Still not able to find a better way but here is dplyr way using the somewhat similar logic.

library(dplyr)

get_diff_vectors_dplyr <- function(x, y) {
  df1 <- data.frame(x) %>% count(x)
  df2 <- data.frame(y) %>% count(y)
  final <- left_join(df1, df2, by = c("x" = "y")) %>%
           mutate_at(c("n.x", "n.y"), funs(replace(., is.na(.), 0))) %>%
           mutate(n = n.x - n.y)

  rep(final$x, final$n)
}

get_diff_vectors_dplyr(x, y)
#[1] 1 3 8
get_diff_vectors_dplyr(x, z)
#[1] 1 5 7
get_diff_vectors_dplyr(x, c(5, 7))
#[1] 1 3 3 8

The vecsets package mentioned by OP has function vsetdiff which does this very easily

vecsets::vsetdiff(x, y)
#[1] 1 3 8
vecsets::vsetdiff(x, z)
#[1] 1 5 7
vecsets::vsetdiff(x, c(5, 7))
#[1] 1 3 3 8

0 讨论(0)

庸人自扰

2020-12-06 21:13

match with a little for-loop does work:

> f(x, y)
[1] 1 3 8
> f(x, z)
[1] 1 5 7

Code

f <- function(s, r) {
    for(i in 1:length(s)){
        j <- match(s[i], r)
        if(!is.na(j)) {
            s[i] <- NA
            r[j] <- NA
        } 
    }
    print(s[complete.cases(s)])
}

0 讨论(0)