Error faced while using TM package's VCorpus in R

后端 未结 2 1788
隐瞒了意图╮
隐瞒了意图╮ 2021-01-18 01:27

I am facing the below error while working on the TM package with R.

library(\"tm\")
Loading required package: NLP
Warning messages:
1: package ‘tm’ was buil         


        
相关标签:
2条回答
  • 2021-01-18 02:16

    I encountered this error using the BTM package also. As Eva notes, it may relate to your column headings (which must be doc_id and text, respectively). However, in my case it was because my doc_id values had become corrupted and were no longer unique. If the error persists, try examining your doc_id values to ensure they increment properly.

    0 讨论(0)
  • 2021-01-18 02:24

    I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

    The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

    Details

    A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

    I solved it with the following code:

    df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)
    
    df_title <- data.frame(doc_id=row.names(df_cmp),
                           text=df_cmp$English.title)
    

    You can try and change the column names to doc_id and text.

    0 讨论(0)
提交回复
热议问题