how to find similar sentences / phrases in R?

前端 未结 2 1407
清酒与你
清酒与你 2021-02-06 08:34

Example, I have billions of short phrases, and I want to clusters of them that are similar.

> strings.to.cluster <- c(\"Best Toyota dealer in bay area. Dr         


        
相关标签:
2条回答
  • 2021-02-06 08:55

    You can view your phrases as "bags of words", i.e., build a matrix (a "term-document" matrix), with one row per phrase, one column per word, with 1 if the word occurs in the phrase and 0 otherwise. (You can replace 1 with some weight that would account for phrase length and word frequency). You can then apply any clustering algorithm. The tm package can help you build this matrix.

    library(tm)
    library(Matrix)
    x <- TermDocumentMatrix( Corpus( VectorSource( strings.to.cluster ) ) )
    y <- sparseMatrix( i=x$i, j=x$j, x=x$v, dimnames = dimnames(x) )  
    plot( hclust(dist(t(y))) )
    
    0 讨论(0)
  • 2021-02-06 09:11

    Maybe looking at this document: http://www.inside-r.org/howto/mining-twitter-airline-consumer-sentiment could help, it uses R and looks at market sentiment for airlines using twitter.

    0 讨论(0)
提交回复
热议问题