How to remove words from wordcloud R package so that they can be included in the output?

放肆的年华 提交于 2019-12-24 02:16:43

问题


I'm using package "wordcloud" with description "Word Cloud" from the R Packages repository. When I create wordcloud from some random text, some words are omitted automatically as they should not be a part of wordcloud.

Code:

library(RColorBrewer)
library(NLP)
library(wordcloud)
library(tm)


wordcloud("foo bar oh oh by by bye bingo hell no", scale=c(3,1), colors=brewer.pal(6,"Dark2"),random.order=FALSE)

Output:

I want to keep words like "oh" and "by" in the wordcloud. How?

Edit: I prefer doing so by removing these words from set of stopwords from wordcloud package, instead of using frequency.


回答1:


Here's one way:

library(wordcloud)
library(tm)
txt <- "foo bar oh oh by by bye bingo hell no"
corp <- Corpus(VectorSource(txt))
tdm <- TermDocumentMatrix(corp, control = list(wordLengths = c(-Inf, Inf)))
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
wordcloud(d$word,d$freq,min.freq=1)



回答2:


There are two ways to use wordcloud():

  • one with a string with all the words as the main argument: what you do now
  • one with a vector of words and a corresponding vector of frequencies

The first input forces wordcloud() to call tm, constitute a corpus, remove the stopwords and this is the step where you lose the two-letter words.

A simple way is to revert to the use of wordcloud that does not require the tm package, by treating your string before feeding it to wordcloud():

library(stringr)
library(wordcloud)
library(RColorBrewer)

## The initial string
mystring <- "foo bar oh oh by by bye bingo hell no"
## Split it and count frequencies
tabl <- table(str_split(mystring,pattern=" "))
## Make the wordcloud: all words are there!
wordcloud(names(tabl),tabl,scale=c(3,1), colors=brewer.pal(6,"Dark2"),random.order=FALSE)


来源:https://stackoverflow.com/questions/39921598/how-to-remove-words-from-wordcloud-r-package-so-that-they-can-be-included-in-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!