Naturally sort a list moving alphanumeric values to the end

无人久伴 提交于 2020-01-14 11:36:26

问题


I have a list of strings I want to natural sort:

c = ['0', '1', '10', '11', '2', '2Y', '3', '3Y', '4', '4Y', '5', '5Y', '6', '7', '8', '9', '9Y']

In addition to natural sort, I want to move all entries that are not pure number strings to the end. My expected output is this:

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

Do note that everything has to be natsorted - even the alphanumeric strings.

I know I can use the natsort package to get what I want, but that alone does not do it for me. I need to do this with two sort calls - one to natural sort, and another to move non-pure numeric strings to the end.

import natsort as ns
r = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())

print(r)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

I'd like to know if it's possible to use natsort in a crafty manner and reduce this to a single sort call.


回答1:


You can actually perform this using natsorted and the correct choice of key.

>>> ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '2Y',
 '3Y',
 '4Y',
 '5Y',
 '9Y']

The key returns a tuple with the original input as the second element. Strings that are digits get placed at the front, all others at the back, then the subsets are sorted individually.

As a side note, Willem Van Onsem's solution uses natsort_key, which has been deprecated as of natsort version 3.0.4 (if you turn on DeprecationWarning in your interpreter you will see that, and the function is now undocumented). It's actually pretty inefficient... it is preferred to use natort_keygen which returns a natural sorting key. natsort_key calls this under the hood, so for every input you are creating a new function and then calling it once.

Below I repeat the tests shown here, and I added my solution using the natsorted method as well as the timing of the other solutions using natsort_keygen instead of natsort_key.

In [13]: %timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 33.3 s per loop

In [14]: natsort_key = ns.natsort_keygen()

In [15]: %timeit sorted(d, key=lambda x: (not x.isdigit(), natsort_key(x)))
1 loop, best of 3: 11.2 s per loop

In [16]: %timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 9.77 s per loop

In [17]: %timeit ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
1 loop, best of 3: 23.8 s per loop



回答2:


natsort has a function natsort_key that converts the item into a tuple based on which sorting is done.

So you can use it as:

sorted(c, key=lambda x: (not x.isdigit(), *ns.natsort_key(x)))

This produces:

>>> sorted(c, key=lambda x: (not x.isdigit(), *ns.natsort_key(x)))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

You can also use it without iterable unpacking, since in that case we have two 2-tuples, and in case of a tie-break on the first item, it will thus compare the outcome of the natsort_key call:

sorted(c, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))



回答3:


I'm grateful to Willem Van Onsem for posting his answer. However, I should note here that the original function's performance is an order of magnitude faster. Taking PM2 Ring's suggestions into account, here's some benchmarks between the two methods:

Setup

c = \
['0',
 '1',
 '10',
 '11',
 '2',
 '2Y',
 '3',
 '3Y',
 '4',
 '4Y',
 '5',
 '5Y',
 '6',
 '7',
 '8',
 '9',
 '9Y']
d = c * (1000000 // len(c) + 1)  # approximately 1M elements

Willem's solution

%timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 2.78 s per loop

Original (w/ PM 2Ring's enhancement)

%timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 796 ms per loop

The explanation for the high performance of the original is because Tim Sort seems to be highly optimised for nearly sorted lists.


Sanity Check

x = sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
y = sorted(ns.natsorted(d), key=str.isdigit, reverse=True)

all(i == j for i, j in zip(x, y))
True


来源:https://stackoverflow.com/questions/47240184/naturally-sort-a-list-moving-alphanumeric-values-to-the-end

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!