Creating “word” cloud of phrases, not individual words in R

不想你离开。 提交于 2019-11-29 13:33:03

Your difficulty is that each element of df$names is being treated as "document" by the functions of tm. For example, the document John A contains the words John and A. It sounds like you want to keep the names as is, and just count up their occurrence - you can just use table for that.

library(wordcloud)
df<-data.frame(theNames=c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C"))
tb<-table(df$theNames)
wordcloud(names(tb),as.numeric(tb), scale=c(8,.3),min.freq=1,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain"))

Install RWeka and its dependencies, then try this:

library(RWeka)
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
# ... other tokenizers
tok <- BigramTokenizer
tdmgram <- TermDocumentMatrix(df.corpus, control = list(tokenize = tok))
#... create wordcloud

The tokenizer-line above chops your text into phrases of length 2.
More specifically, it creates phrases of minlength 2 and maxlength 2.
Using Weka's general NGramTokenizer Algorithm, You can create different tokenizers (e.g minlength 1, maxlength 2), and you'll probably want to experiment with different lengths. You can also call them tok1, tok2 instead of the verbose "BigramTokenizer" I've used above.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!