Base word stemming instead of root word stemming in R

前端未结

关注

 4  446

离开以前 2021-02-05 18:10

Is there any way to get base word instead of root word in stemming using NLP in R?

Code:

> #Loading libraries
> library(tm)
> library(slam)
>


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   情书的邮戳
                                             
                
                
                (楼主)
            
              
              
                2021-02-05 18:44
              

            
            
                        
Without a good knowledge of English morphology, you would have to use an existing library rather than create your own stemmer. 

English is full of unexpected morphological surprises that would affect both probabilistic and rule-based models. Some examples are:


Removing an in- prefix to remove an -able suffix, as in inhabitable.
Change of the word's category, as in the noun bicycle resulting from stemming the verb bicycling (can affect rules based on categories).
Words with negative meanings cannot take negative prefixes (you can have unpretty, but not unugly).
Two words as a compound, as in "truck driver" (you would treat them as one word when you stem).


English also has an issue with I-umlaut, where words like men, geese, feet, best, and a host of other words (all with an 'e'-like sound) cannot be easily stemmed. Stemming foreign, borrowed words, like automaton, may also be an issue.

Stemming the superlative form is a good example of exceptions:

best -> good

eldest -> old 

A lemmatizer would account for such exceptions, but would be slower. You can look at the Porter stemmer rules to get an idea of what you need, or you can just use its SnowballC R package.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复