Sort dictionary by key using locale/collation

后端 未结 2 582
梦如初夏
梦如初夏 2021-02-13 19:28

The following code is ignoring the locale and Égypt goes at the end, what\'s wrong?

dict = {\"United States\": \"United States\", \"Spain\" : \"Spain\", \"Englan         


        
相关标签:
2条回答
  • 2021-02-13 19:32

    Here's a work-around.

    Use unicode's normalization form canonical decomposition http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

    # utf-8 <-> unicode is left as exercise to the reader
    egypt = unicodedata.normalize("NFD", egypt)
    
    sorted(['Egypt', 'E\xcc\x81gypt', 'US'])
    ['Egypt', 'E\xcc\x81gypt', 'US']
    

    This doesn't actually take locale into consideration.

    Beyond this, try newer Python (yes I know) or ICU library from Martijn's linked question and respective answers.

    0 讨论(0)
  • 2021-02-13 19:44

    Consider the following...

    import unicodedata
    from collections import OrderedDict
    dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}
    
    import locale
    
    # using your default locale (user settings)
    locale.setlocale(locale.LC_ALL,"fr_FR")
    
    print OrderedDict(sorted(dict.items(),cmp= lambda a,b: locale.strcoll(unicodedata.normalize('NFD', unicode(a)[0]).encode('ASCII', 'ignore'),
                                                                           unicodedata.normalize('NFD', unicode(b)[0]).encode('ASCII', 'ignore'))))
    
    0 讨论(0)
提交回复
热议问题