matrix %in% matrix

后端未结

关注

 5  893

Suppose I have two matrices, each with two columns and differing numbers of row. I want to check and see which pairs of one matrix are in the other matrix. If these were one

相关标签:

5条回答

醉酒成梦

2020-12-08 23:29

Recreate your data:

a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)

Define a function %inm% that is a matrix analogue to %in%:

`%inm%` <- function(x, matrix){
  test <- apply(matrix, 1, `==`, x)
  any(apply(test, 2, all))
}

Apply this to your data:

apply(a, 1, `%inm%`, x)
[1] FALSE  TRUE  TRUE FALSE

To compare a single row:

a[1, ] %inm% x
[1] FALSE

a[2, ] %inm% x
[1] TRUE

0 讨论(0)

情书的邮戳

2020-12-08 23:31

Andrie's solution is perfectly fine. But if you have big matrices, you might want to try something else, based on recursion. If you work columnwise, you can cut down on the calculation time by excluding everything that doesn't match at the first position:

fastercheck <- function(x,matrix){
  nc <- ncol(matrix)
  rec.check <- function(r,i,id){
    id[id] <- matrix[id,i] %in% r[i]
    if(i<nc & any(id)) rec.check(r,i+1,id) else any(id)
  }
  apply(x,1,rec.check,1,rep(TRUE,nrow(matrix)))
}

The comparison :

> set.seed(100)
> x <- matrix(runif(1e6),ncol=10)
> a <- matrix(runif(300),ncol=10)
> a[c(3,7,9,15),] <- x[c(1000,48213,867,20459),]
> system.time(res1 <- a %inm% x)
   user  system elapsed 
  31.16    0.14   31.50 
> system.time(res2 <- fastercheck(a,x))
   user  system elapsed 
   0.37    0.00    0.38 
> identical(res1, res2)
[1] TRUE
> which(res2)
[1]  3  7  9 15

EDIT:

I checked the accepted answer just for fun. Performs better than the double apply ( as you get rid of the inner loop), but recursion still rules! ;-)

> system.time(apply(a, 1, paste, collapse="$$") %in% 
 + apply(x, 1, paste, collapse="$$"))
   user  system elapsed 
   6.40    0.01    6.41

0 讨论(0)

生来不讨喜

2020-12-08 23:42

Coming in late to the game: I had previously written an algorithm using the "paste with delimiter" method, and then found this page. I was guessing that one of the code snippets here would be the fastest, but:

andrie<-function(mfoo,nfoo) apply(mfoo, 1, `%inm%`, nfoo)
# using Andrie's %inm% operator exactly as above
carl<-function(mfoo,nfoo) {
 allrows<-unlist(sapply(1:nrow(mfoo),function(j) paste(mfoo[j,],collapse='_'))) 
 allfoo <- unlist(sapply(1:nrow(nfoo),function(j) paste(nfoo[j,],collapse='_')))
 thewalls<-setdiff(allrows,allfoo)
 dowalls<-mfoo[allrows%in%thewalls,]
}

 ramnath <- function (a,x) apply(a, 1, digest) %in% apply(x, 1, digest)

 mfoo<-matrix( sample(1:100,400,rep=TRUE),nr=100)
 nfoo<-mfoo[sample(1:100,60),]

 library(microbenchmark)
 microbenchmark(andrie(mfoo,nfoo),carl(mfoo,nfoo),ramnath(mfoo,nfoo),times=5)

Unit: milliseconds
                expr       min        lq    median        uq            max neval
  andrie(mfoo, nfoo) 25.564196 26.527632 27.964448 29.687344     102.802004     5
    carl(mfoo, nfoo)  1.020310  1.079323  1.096855  1.193926       1.246523     5
 ramnath(mfoo, nfoo)  8.176164  8.429318  8.539644  9.258480       9.458608     5

So apparently constructing character strings and doing a single set operation is fastest! (PS I checked and all 3 algorithms give the same result)

0 讨论(0)

一生所求

2020-12-08 23:44

Another approach would be:

> paste(a[,1], a[,2], sep="$$") %in% paste(x[,1], x[,2], sep="$$")
[1] FALSE  TRUE  TRUE FALSE

A more general version of this is:

> apply(a, 1, paste, collapse="$$") %in% apply(x, 1, paste, collapse="$$")
[1] FALSE  TRUE  TRUE FALSE

0 讨论(0)

北海茫月

2020-12-08 23:50
Here is another approach using the digest package and creating checksums for each row, which are generated using a hashing algorithm (the default being md5)
```
a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)
apply(a, 1, digest) %in% apply(x, 1, digest)

[1] FALSE  TRUE  TRUE FALSE
```
0 讨论(0)
发布评论:

提交评论
- 加载中...