Say I have a sorted list of strings as in:
[\'A\', \'B\' , \'B1\', \'B11\', \'B2\', \'B21\', \'B22\', \'C\', \'C1\', \'C11\', \'C2\']
Now I wan
In the simple case where you just want to sort trailing digits numerically and their non-digit prefixes alphabetically, you need a key function which splits each item into non-digit and digit components as follows:
'AB123' -> ['AB', 123]
'CD' -> ['CD']
'456' -> ['', 456]
Note: In the last case, the empty string
''
is not strictly necessary in CPython 2.x, as integers sort before strings – but that's an implementation detail rather than a guarantee of the language, and in Python 3.x it is necessary, because strings and integers can't be compared at all.
You can build such a key function using a list comprehension and re.split():
import re
def trailing_digits(x):
return [
int(g) if g.isdigit() else g
for g in re.split(r'(\d+)$', x)
]
Here it is in action:
>>> s1 = ['11', '2', 'A', 'B', 'B1', 'B11', 'B2', 'B21', 'C', 'C11', 'C2']
>>> sorted(s1, key=trailing_digits)
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Once you add the restriction that only strings with a particular prefix or prefixes have their trailing digits sorted numerically, things get a little more complicated.
The following function builds and returns a key function which fulfils the requirement:
def prefixed_digits(*prefixes):
disjunction = '|'.join('^' + re.escape(p) for p in prefixes)
pattern = re.compile(r'(?<=%s)(\d+)$' % disjunction)
def key(x):
return [
int(g) if g.isdigit() else g
for g in re.split(pattern, x)
]
return key
The main difference here is that a precompiled regex is created (containing a lookbehind constructed from the supplied prefix or prefixes), and a key function using that regex is returned.
Here are some usage examples:
>>> s2 = ['A', 'B', 'B11', 'B2', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'C'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C2', 'C11', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'D'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D2', 'D12']
If called with no arguments, prefixed_digits()
returns a key function which behaves identically to trailing_digits
:
>>> sorted(s1, key=prefixed_digits())
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Caveats:
Due to a restriction in Python's re
module regarding lookbhehind syntax, multiple prefixes must have the same length.
In Python 2.x, strings which are purely numeric will be sorted numerically regardless of which prefixes are supplied to prefixed_digits()
. In Python 3, they'll cause an exception (except when called with no arguments, or in the special case of key=prefixed_digits('')
– which will sort purely numeric strings numerically, and prefixed strings alphabetically). Fixing that may be possible with a significantly more complex regex, but I gave up trying after about twenty minutes.