问题
I currently use wordle for many artsy uses of the word cloud. I think that R's word cloud, potentially, has better control.
1) How do you keep a word capitalized in the word cloud? [SOLVED]
2) How do keep two words as one chunk in the wordcloud? (wordle uses the ~ operator to accomplish this, R's word cloud merely prints the ~ as is) [For instance where there's a ~ between "to" and "be" I'd like a space in the word cloud]
require(wordcloud)
y<-c("the", "the", "the", "tree", "tree", "tree", "tree", "tree",
"tree", "tree", "tree", "tree", "tree", "Wants", "Wants", "Wants",
"Wants", "Wants", "Wants", "Wants", "Wants", "Wants", "Wants",
"Wants", "Wants", "to~be", "to~be", "to~be", "to~be", "to~be",
"to~be", "to~be", "to~be", "to~be", "to~be", "to~be", "to~be",
"to~be", "to~be", "to~be", "to~be", "to~be", "to~be", "to~be",
"to~be", "when", "when", "when", "when", "when", "familiar", "familiar",
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar",
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar",
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar",
"leggings", "leggings", "leggings", "leggings", "leggings", "leggings",
"leggings", "leggings", "leggings", "leggings")
wordcloud(names(table(y)), table(y))
回答1:
You asked two questions:
- You can control the capitalisation (or not) by specifying a control argument to
TermDocumentMatrix
- No doubt there is an argument somewhere to control the
~
, but here is an easy workaround: Usegsub
to change~
to white space in the step just before plotting.
Some code:
corpus <- Corpus(VectorSource(y))
tdm <- TermDocumentMatrix(corpus, control=list(tolower=FALSE)) ## Edit 1
m <- as.matrix(tdm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
d$word <- gsub("~", " ", d$word) ## Edit 2
wordcloud(d$word, d$freq)
来源:https://stackoverflow.com/questions/8069571/spaces-in-wordcloud