How to search for specific terms in a DTM

后端未结

关注

 1  775

I have a dataset of 200+ pdf\'s that I converted into a corpus. I\'m using the TM package for R for text pre-processing and mining. So far, I\'ve successfully created the DTM (

相关标签:

1条回答

逝去的感伤

2021-01-29 00:30

You can use the option dictionary when you create your DocumentTermMatrix. See in the example code how it works. Once in the documenttermmatrix form or in a data.frame form you can use aggregation functions if you don't need the word counts per document.

library(tm)

data("crude")
crude <- as.VCorpus(crude)
crude <- tm_map(crude, content_transformer(tolower))

my_words <- c("oil", "corporation")

dtm <- DocumentTermMatrix(crude, control=list(dictionary = my_words))

# create data.frame from documenttermmatrix
df1 <- data.frame(docs = dtm$dimnames$Docs, as.matrix(dtm), row.names = NULL)
head(df1)
   docs corporation oil
1   127           0   5
2   144           0  11
3   191           0   2
4   194           0   1
5   211           0   1
6   236           0   7

0 讨论(0)