Finding 2 & 3 word Phrases Using R TM Package

后端 未结 7 1919
死守一世寂寞
死守一世寂寞 2020-11-28 04:26

I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it th

相关标签:
7条回答
  • 2020-11-28 05:19

    Try tidytext package

    library(dplyr)
    library(tidytext)
    library(janeaustenr)
    library(tidyr
    

    )

    Suppose I have a dataframe CommentData that contains comment column and I want to find occurrence of two words together. Then try

    bigram_filtered <- CommentData %>%
      unnest_tokens(bigram, Comment, token= "ngrams", n=2) %>%
      separate(bigram, c("word1","word2"), sep=" ") %>%
      filter(!word1 %in% stop_words$word,
             !word2 %in% stop_words$word) %>%
      count(word1, word2, sort=TRUE)
    

    The above code creates tokens, and then remove stop words that doesn't help in analysis(eg. the,an,to etc.) Then you count occurrence of these words. You will be then using unite function to combine individual words and record their occurrence.

    bigrams_united <- bigram_filtered %>%
      unite(bigram, word1, word2, sep=" ")
    bigrams_united
    
    0 讨论(0)
提交回复
热议问题