Cosine similarity of 2 DTMs in R

后端 未结 1 1480
粉色の甜心
粉色の甜心 2021-01-15 01:22

I have 2 Document term matrices:

  1. DTM 1 has say 1000 vectors(1000 docs) and
  2. DTM2 has 20 vectors (20 docs)

So basically I want to compa

相关标签:
1条回答
  • 2021-01-15 02:11

    Here is a way to calculate the cosine distance between two matrices. The use of tm is just for data purposes...

    library(slam)
    library(tm)
    data("acq")
    data("crude")
    
    dtm <- DocumentTermMatrix(c(acq, crude))
    
    index <- sample(1:70, size = 10)
    
    dtm1 <- dtm[index, ]
    dtm2 <- dtm[-index, ]
    
    cosine_sim <- tcrossprod_simple_triplet_matrix(dtm1, dtm2)/sqrt(row_sums(dtm1^2) %*% t(row_sums(dtm2^2)))
    

    The cosine function was adapted from this SO post: R: Calculate cosine distance from a term-document matrix with tm and proxy

    0 讨论(0)
提交回复
热议问题