How can I manually set the document id in a corpus?

后端 未结 3 451
轮回少年
轮回少年 2021-01-22 06:32

I am creating a Copus from a dataframe. I pass it as a VectorSource as there is only one column I want to be used as the text source. This works find however I need

3条回答
  •  醉梦人生
    2021-01-22 07:08

    Here is a qdap approach to this problem that can handle it without the loop:

    Use qdap version >= 1.1.0 right from the get go to convert the dataframe to a Corpus and the ID tags will be automatically added.

    with(df, as.Corpus(textColumn, ids))
    
    ## <>
    ## Metadata:  corpus specific: 0, document level (indexed): 3
    ## Content:  documents: 6
    
    
    ## Look around a bit
    meta(with(df, as.Corpus(textColumn, ids)), tag="id")
    inspect(with(df, as.Corpus(textColumn, ids)))
    

提交回复
热议问题