wordcloud package: get “Error in strwidth(…) : invalid 'cex' value”

匿名 (未验证) 提交于 2019-12-03 02:03:01

问题:

I am using the tm and wordcloud packages in R 2.15.1. I am trying to make a word cloud Here is the code:

maruti_tweets = userTimeline("Maruti_suzuki", n=1000,cainfo="cacert.pem") hyundai_tweets = userTimeline("HyundaiIndia", n=1000,cainfo="cacert.pem") tata_tweets = userTimeline("TataMotor", n=1000,cainfo="cacert.pem") toyota_tweets = userTimeline("Toyota_India", n=1000,cainfo="cacert.pem") # get text maruti_txt = sapply(maruti_tweets, function(x) x$getText()) hyundai_txt = sapply(hyundai_tweets, function(x) x$getText()) tata_txt = sapply(tata_tweets, function(x) x$getText()) toyota_txt = sapply(toyota_tweets, function(x) x$getText()) clean.text = function(x)  {    # tolower    x = tolower(x)    # remove rt    x = gsub("rt", "", x)    # remove at    x = gsub("@\\w+", "", x)    # remove punctuation    x = gsub("[[:punct:]]", "", x)    # remove numbers    x = gsub("[[:digit:]]", "", x)    # remove links http    x = gsub("http\\w+", "", x)    # remove tabs    x = gsub("[ |\t]{2,}", "", x)    # remove blank spaces at the beginning    x = gsub("^ ", "", x)    # remove blank spaces at the end    x = gsub(" $", "", x)    return(x) } # clean texts maruti_clean = clean.text(maruti_txt) hyundai_clean = clean.text(hyundai_txt) tata_clean = clean.text(tata_txt) toyota_clean = clean.text(toyota_txt) maruti = paste(maruti_clean, collapse=" ") hyundai= paste(hyundai_clean, collapse=" ") tata= paste(tata_clean, collapse=" ") toyota= paste(toyota_clean, collapse=" ") # put ehyundaiything in a single vector all = c(maruti, hyundai, tata, toyota) # remove stop-words all = removeWords(all, c(stopwords("english"), "maruti", "tata", "hyundai", "toyota")) # create corpus corpus = Corpus(VectorSource(all)) # create term-document matrix tdm = TermDocumentMatrix(corpus) # convert as matrix tdm = as.matrix(tdm) # add column names colnames(tdm) = c("MARUTI", "HYUNDAI", "TATA", "TOYOTA") # comparison cloud comparison.cloud(tdm, random.order=FALSE,colors = c("#00B2FF", "red",     #FF0099","#6600CC"),max.words=500)

but getting following error

Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value please help

回答1:

You have a typo in TataMotors twitter account. It should be spelled 'TataMotors', not 'TataMotor'. As a result, one column in your term matrix is empty and when cex is calculated it get assigned NAN.

Fix the typo and the rest of the code works fine. Good luck!



回答2:

I spotted the empty-column issue in a different application throwing the same error. In my case it was because of the removeSparseTerms command applied to a document term matrix. Using str() helped me identify the bug.

The input variable (slightly edited) had 289 columns:

> str(corpus.dtm) List of 6 $ i       : int [1:443] 3 4 6 8 10 12 15 18 19 21 ... $ j       : int [1:443] 105 98 210 93 287 249 126 223 129 146 ... $ v       : num [1:443] 1 1 1 1 1 1 1 1 1 1 ... $ nrow    : int 1408 $ ncol    : int 289 $ dimnames:List of 2 ..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ... ..$ Terms: chr [1:289] "word1" "word2" "word3" "word4" ... - attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix" - attr(*, "weighting")= chr [1:2] "term frequency" "tf"

The command was:

removeSparseTerms(corpus.dtm,0.90)->corpus.dtm.frequent

And the result had 0 columns:

> str(corpus.dtm.frequent) List of 6 $ i       : int(0)  $ j       : int(0)  $ v       : num(0)  $ nrow    : int 1408 $ ncol    : int 0 $ dimnames:List of 2 ..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ... ..$ Terms: NULL - attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix" - attr(*, "weighting")= chr [1:2] "term frequency" "tf"

Raising the sparsity coefficient from 0.90 to 0.95 solved the issue. For a wordier document I went up to 0.999 in order to have a non-empty result after removing the sparse terms.

Empty columns are a good thing to check out when this error occurs.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!