Finding 2 & 3 word Phrases Using R TM Package

后端未结

关注

 7  1919

I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it th

相关标签:

7条回答

生来不讨喜

2020-11-28 05:19
Try tidytext package
```
library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr
```
)

Suppose I have a dataframe CommentData that contains comment column and I want to find occurrence of two words together. Then try
```
bigram_filtered <- CommentData %>%
  unnest_tokens(bigram, Comment, token= "ngrams", n=2) %>%
  separate(bigram, c("word1","word2"), sep=" ") %>%
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word) %>%
  count(word1, word2, sort=TRUE)
```
The above code creates tokens, and then remove stop words that doesn't help in analysis(eg. the,an,to etc.) Then you count occurrence of these words. You will be then using unite function to combine individual words and record their occurrence.
```
bigrams_united <- bigram_filtered %>%
  unite(bigram, word1, word2, sep=" ")
bigrams_united
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2