I am creating a Copus from a dataframe. I pass it as a VectorSource
as there is only one column I want to be used as the text source. This works find however I need
I know it's probably late for @user1098798, but there is a way how you can specify ids directly when creating the corpus. You need to load the data as DataframeSource()
and add mapping to the columns:
corpus = VCorpus(DataframeSource(df), readerControl = list(reader = readTabular(mapping = list(content = "textColumn", id = "ids"))))
Here is a qdap approach to this problem that can handle it without the loop:
Use qdap version >= 1.1.0 right from the get go to convert the dataframe to a Corpus
and the ID tags will be automatically added.
with(df, as.Corpus(textColumn, ids))
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 3
## Content: documents: 6
## Look around a bit
meta(with(df, as.Corpus(textColumn, ids)), tag="id")
inspect(with(df, as.Corpus(textColumn, ids)))
Well, one simple but not very elegant way to assign your ids to your documents afterward could be the following :
for (i in 1:length(corpus)) {
attr(corpus[[i]], "ID") <- df$ids[i]
}