How to search for specific terms in a DTM

后端 未结 1 775
囚心锁ツ
囚心锁ツ 2021-01-29 00:07

I have a dataset of 200+ pdf\'s that I converted into a corpus. I\'m using the TM package for R for text pre-processing and mining. So far, I\'ve successfully created the DTM (

相关标签:
1条回答
  • 2021-01-29 00:30

    You can use the option dictionary when you create your DocumentTermMatrix. See in the example code how it works. Once in the documenttermmatrix form or in a data.frame form you can use aggregation functions if you don't need the word counts per document.

    library(tm)
    
    data("crude")
    crude <- as.VCorpus(crude)
    crude <- tm_map(crude, content_transformer(tolower))
    
    my_words <- c("oil", "corporation")
    
    dtm <- DocumentTermMatrix(crude, control=list(dictionary = my_words))
    
    # create data.frame from documenttermmatrix
    df1 <- data.frame(docs = dtm$dimnames$Docs, as.matrix(dtm), row.names = NULL)
    head(df1)
       docs corporation oil
    1   127           0   5
    2   144           0  11
    3   191           0   2
    4   194           0   1
    5   211           0   1
    6   236           0   7
    
    0 讨论(0)
提交回复
热议问题