Solr accent removal

后端未结
关注
 1  1918
無奈伤痛 2021-02-11 05:44
i have read various threads about how to remove accents during index/query time. The current fieldtype i have come up with looks like the following:

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   时光说笑
                                             
                
                
                (楼主)
            
              
              
                2021-02-11 06:04
              

            
            
                        
The issue is you are applying StandardTokenizerFactory before applying the ASCIIFoldingFilterFactory. Instead you should use the MappingCharFilterFactory character filter factory first and the the StandardTokenizerFactory.

As per the Solr Reference guide StandardTokenizerFactory supports , , , , and . Therefore when you tokenize using StandardTokenizerFactory the umlaut characters are lost and your ASCIIFoldingFilterFactory is of no use after that.

Your fieldType should be like below if you want to go for StandardTokenizerFactory.

     
    
            
            
            
         



The mapping-ISOLatin1Accent.txt should have the mappings for such "special" characters. In Solr this file comes pre-populated by default. For e.g. ü -> ue, ä -> ae, etc.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复