I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it th
Try tidytext package
library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr
)
Suppose I have a dataframe CommentData that contains comment column and I want to find occurrence of two words together. Then try
bigram_filtered <- CommentData %>%
unnest_tokens(bigram, Comment, token= "ngrams", n=2) %>%
separate(bigram, c("word1","word2"), sep=" ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(word1, word2, sort=TRUE)
The above code creates tokens, and then remove stop words that doesn't help in analysis(eg. the,an,to etc.) Then you count occurrence of these words. You will be then using unite function to combine individual words and record their occurrence.
bigrams_united <- bigram_filtered %>%
unite(bigram, word1, word2, sep=" ")
bigrams_united