Sort dictionary by key using locale/collation

后端未结

关注

 2  582

The following code is ignoring the locale and Égypt goes at the end, what\'s wrong?

dict = {\"United States\": \"United States\", \"Spain\" : \"Spain\", \"Englan


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2021-02-13 19:32
              
            
            
                                                                       
Here's a work-around.

Use unicode's normalization form canonical decomposition http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

# utf-8 <-> unicode is left as exercise to the reader
egypt = unicodedata.normalize("NFD", egypt)

sorted(['Egypt', 'E\xcc\x81gypt', 'US'])
['Egypt', 'E\xcc\x81gypt', 'US']


This doesn't actually take locale into consideration.

Beyond this, try newer Python (yes I know) or ICU library from Martijn's linked question and respective answers.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-02-13 19:44
              
            
            
                                                                       
Consider the following...

import unicodedata
from collections import OrderedDict
dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}

import locale

# using your default locale (user settings)
locale.setlocale(locale.LC_ALL,"fr_FR")

print OrderedDict(sorted(dict.items(),cmp= lambda a,b: locale.strcoll(unicodedata.normalize('NFD', unicode(a)[0]).encode('ASCII', 'ignore'),
                                                                       unicodedata.normalize('NFD', unicode(b)[0]).encode('ASCII', 'ignore'))))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复