In Python, how can I naturally sort a list of alphanumeric strings such that alpha characters sort ahead of numeric characters?

后端 未结 3 1499
灰色年华
灰色年华 2020-12-11 13:20

This is a fun little challenge that confronted me recently. I\'ll provide my answer below, but I\'m curious to see whether there are more elegant or efficient solutions.

相关标签:
3条回答
  • 2020-12-11 13:38
    re_natural = re.compile('[0-9]+|[^0-9]+')
    
    def natural_key(s):
        return [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in re_natural.findall(s)] + [s]
    
    for case in test_cases:
        print case[1]
        print sorted(case[0], key=natural_key)
    
    ['a', 'b', 'c']
    ['a', 'b', 'c']
    ['A', 'b', 'C']
    ['A', 'b', 'C']
    ['a', 'B', 'r', '0', '9']
    ['a', 'B', 'r', '0', '9']
    ['a1', 'a2', 'a100', '1a', '10a']
    ['a1', 'a2', 'a100', '1a', '10a']
    ['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
    ['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
    ['A', 'a', 'b', 'r', '0', '9']
    ['A', 'a', 'b', 'r', '0', '9']
    ['ABc', 'Abc', 'abc']
    ['ABc', 'Abc', 'abc']
    

    Edit: I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.

    def natural_key2(s):
        parts = re_natural.findall(s)
        natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]
        ties_alpha = [c for c in parts if not c.isdigit()]
        ties_numeric = [c for c in parts if c.isdigit()]
        return natural + [(-1,)] + ties_alpha + ties_numeric
    

    This generates identical results for the test cases above, plus the desired output for the bonus case:

    ['A', 'a', 'A0', 'a0', '0', '00', '0A', '00A', '0a', '00a']
    
    0 讨论(0)
  • 2020-12-11 13:43

    This function makes no claims as to performance at this time:

    def alpha_before_numeric_natural_sensitive(unsorted_list):
        """presorting the list should work because python stable sorts; see:
        http://wiki.python.org/moin/HowTo/Sorting/#Sort_Stability_and_Complex_Sorts"""
        presorted_list = sorted(unsorted_list)
        return alpha_before_numeric_natural(presorted_list)
    
    def alpha_before_numeric_natural(unsorted_list):
        """splice each string into tuple like so:
        'abc100def' -> ('a', 'b', 'c', 100, 'd', 'e', 'f') ->
        (ord('a'), ord('b'), ord('c'), ord('z') + 1 + 100, ...) then compare
        each tuple"""
        re_p = "([0-9]+|[A-za-z])"
        ordify = lambda s: ord('z') + 1 + int(s) if s.isdigit() else ord(s.lower())
        str_to_ord_tuple = lambda key: [ordify(c) for c in re.split(re_p, key) if c]
        return sorted(unsorted_list, key=str_to_ord_tuple)
    

    It's based off the insight provided by this natural sort solution and this function that I wrote:

    def alpha_before_numeric(unsorted_list):
        ord_shift = lambda c: c.isdigit() and chr(ord('z') + int(c.lower()) + 1) or c.lower()
        adjust_word = lambda word: "".join([ord_shift(c) for c in list(word)])
    
        def cmp_(a, b):
            return cmp(adjust_word(a), adjust_word(b))
    
        return sorted(unsorted_list, cmp_)
    

    For full test script here that compares different functions, see http://klenwell.com/is/Pastebin20120829

    0 讨论(0)
  • 2020-12-11 13:54

    Here's one that works for the bonus test too:

    def mykey(s):
        lst = re.findall(r'(\d+)|(\D+)', s)
        return [(0,a.lower()) if a else (1,int(n)) for n, a in lst]\
            + [a for n, a in lst if a]\
            + [len(n) for n, a in lst if n]
    
    def mysort(lst):
        return sorted(lst, key=mykey)
    

    With this type of pattern, re.findall breaks the string to a list of tuples, e.g.

    >>> re.findall(r'(\d+)|(\D+)', 'ab12cd')
    [('', 'ab'), ('12', ''), ('', 'cd')]
    
    0 讨论(0)
提交回复
热议问题