R DocumentTermMatrix control list not working, silently ignores unknown parameters

别说谁变了你拦得住时间么 提交于 2019-11-29 14:14:02

问题


I have two following DTM-s:

dtm <- DocumentTermMatrix(t)

dtmImproved <- DocumentTermMatrix(t, 
               control=list(minWordLength = 4, minDocFreq=5))

When I implement this, I see two equal DTM-s and if I open the dtmImproved, there are words with 3 symbols. Why doesn't the minWordLength parameter work? Thank you!

> dtm
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)
> dtmImproved
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)

回答1:


dtmImproved <- DocumentTermMatrix(t, control=list(wordLengths=c(4, 15), 
                                   bounds = list(global = c(5,Inf))))

This solves the problem! The lack of proper documentation really mads me down (:




回答2:


It is always a good idea to read the source code if available. Read the Source code of the wordcloud function@GitHub, here is what it says:
# Author: ianfellows
.....
if(min.freq > max(freq))
min.freq <- 0

So your DocumentTermMatrix, returned a max(freq) < min.freq bound that you set, i.e. non-of the terms appeared in more than your min.freq bound that you set.

Hope this Helps MJJ



来源:https://stackoverflow.com/questions/13366897/r-documenttermmatrix-control-list-not-working-silently-ignores-unknown-paramete

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!