How can I manually set the document id in a corpus?

后端 未结 3 459
轮回少年
轮回少年 2021-01-22 06:32

I am creating a Copus from a dataframe. I pass it as a VectorSource as there is only one column I want to be used as the text source. This works find however I need

3条回答
  •  情话喂你
    2021-01-22 07:02

    I know it's probably late for @user1098798, but there is a way how you can specify ids directly when creating the corpus. You need to load the data as DataframeSource() and add mapping to the columns:

    corpus = VCorpus(DataframeSource(df), readerControl = list(reader = readTabular(mapping = list(content = "textColumn", id = "ids"))))
    

提交回复
热议问题