In R tm package, build corpus FROM Document-Term-Matrix

前端 未结 1 1943
-上瘾入骨i
-上瘾入骨i 2021-01-13 18:06

It\'s straightforward to build a document-term matrix from a corpus with the tm package. I\'d like to build a corpus from a document-term-matrix.

Let M be the numb

相关标签:
1条回答
  • 2021-01-13 18:41

    Here's on approach providing my own minimal reproducible example (as a new user you may not be aware that this is your responsibility) from the tm package:

    ## Minimal Reproducible Example
    library(tm)
    data("crude")
    dtm <- DocumentTermMatrix(crude,
        control = list(weighting =
        function(x)
            weightTfIdf(x, normalize = FALSE),
            stopwords = TRUE))
    
    ## Convert tdm to a list of text
    dtm2list <- apply(dtm, 1, function(x) {
        paste(rep(names(x), x), collapse=" ")
    })
    
    ## convert to a Corpus
    myCorp <- VCorpus(VectorSource(dtm2list))
    inspect(myCorp)
    
    ## Stemming
    myCorp <- tm_map(myCorp, stemDocument)
    inspect(myCorp)
    
    0 讨论(0)
提交回复
热议问题