Error faced while using TM package's VCorpus in R

后端未结

关注

 2  1799

I am facing the below error while working on the TM package with R.

library(\"tm\")
Loading required package: NLP
Warning messages:
1: package ‘tm’ was buil


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  花落未央        
                
              
                            
                2021-01-18 02:16
              
            
            
                                                                       
I encountered this error using the BTM package also. As Eva notes, it may relate to your column headings (which must be doc_id and text, respectively). However, in my case it was because my doc_id values had become corrupted and were no longer unique. If the error persists, try examining your doc_id values to ensure they increment properly.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2021-01-18 02:24
              
            
            
                                                                       
I met the same problem when I updated the tm package to 0.7-2 version.
I looked for details of DataframeSource(), it mentioned: 


  The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".


Details


  A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.


I solved it with the following code:

df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)

df_title <- data.frame(doc_id=row.names(df_cmp),
                       text=df_cmp$English.title)


You can try and change  the column names to doc_id and text.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复