reuters | 易学教程

tm Package error: Error definining Document Term Matrix

阅读更多关于 tm Package error: Error definining Document Term Matrix

问题 I am analyzing the Reuters 21578 corpus, all the Reuters news articles from 1987, using the "tm" package. After importing the XML files into an R data file, I clean the text--convert to plaintext, convert to lwer case, remove stop words etc. (as seen below)--then I try to convert the corpus to a document term matrix but receive an error message: Error in UseMethod("Content", x) : no applicable method for 'Content' applied to an object of class "character" All pre-processing steps work

Using R for Text Mining Reuters-21578

阅读更多关于 Using R for Text Mining Reuters-21578

问题 I am trying to do some work with the well known Reuters-21578 dataset and am having some trouble with loading the sgm files into my corpus. Right now I am using the command require(tm) reut21578 <- system.file("reuters21578", package = "tm") reuters <-Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML)) In an attempt to include all the files into my corpus but this gives me the following error: Error in DirSource(reut21578) : empty directory Any idea where I may be

tm Package error: Error definining Document Term Matrix

阅读更多关于 tm Package error: Error definining Document Term Matrix

I am analyzing the Reuters 21578 corpus, all the Reuters news articles from 1987, using the "tm" package. After importing the XML files into an R data file, I clean the text--convert to plaintext, convert to lwer case, remove stop words etc. (as seen below)--then I try to convert the corpus to a document term matrix but receive an error message: Error in UseMethod("Content", x) : no applicable method for 'Content' applied to an object of class "character" All pre-processing steps work correctly up until document term matrix. I created a non-random subset of the corpus (with 4000 documents) and