How do I check equality of Unicode strings in Javascript?

前端未结

关注

 1  1635

I have two strings in Javascript: \"_strange_chars_µö¬é@zendesk.com.eml\" (f1) and \"_strange_chars_µö¬é@zendesk.com.eml\" (f2


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2020-12-31 04:44
              
            
            
                                                                       

  f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character.


f1 is in Normal Form C (composed) and f2 in Normal Form D (decomposed). In general Normal Form C is the most common on Windows and the web, with the Unicode FAQ describing it as “the best form for general text”. Unfortunately the Apple world plumped for Normal Form D in order to be gratuitously different.

The strings are canonically equivalent by the rules of Unicode equivalence.


  What comparison can I do that will show these two strings to be "equal"?


In general, you convert both strings to one Normal Form of your choosing and then compare them. For example in Python:

>>> import unicodedata
>>> a= u'\u00F6'  # ö composed
>>> b= u'o\u0308' # o then combining umlaut
>>> unicodedata.normalize('NFC', a)==unicodedata.normalize('NFC', b)
True


Similarly Java has the Normalizer class, .NET has String.Normalize, and may languages have bindings available to the ICU library which also offers this feature.

Unfortunately, JavaScript has no native Unicode normalisation ability. This means either:


doing it yourself, carting around large Unicode data tables to cover it all in JavaScript (see eg here for an example implementation); or
sending it back to the server-side (eg via XMLHttpRequest), where you've got a better-equipped language to do it.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复