I am looking for an efficient way to sort a list of strings according a custom alphabet.
For example, I have a string alphabet which is \"bafmxpzv\"
and
Let's create an alphabet and a list of words:
In [32]: alphabet = "bafmxpzv"
In [33]: a = ['af', 'ax', 'am', 'ab', 'zvpmf']
Now let's sort them according to where the letters appear in alphabet
:
In [34]: sorted(a, key=lambda word: [alphabet.index(c) for c in word])
Out[34]: ['ab', 'af', 'am', 'ax', 'zvpmf']
The above sorts in the correct order.
sorted
enables a wide range of custom sorting. The sorted
function has three optional arguments: cmp
, key
, and reverse
:
cmp
is good for complex sorting tasks. If specified, cmp
should be a functionIt that takes two arguments. It should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument. For this case, cmp
is overkill.
key
, if spedified, should be a function that takes one argument and returns something that python knows natively how to sort. In this case, key returns a list of the indices of each of the word's characters in the alphabet.
In this case, key
returns the index of a letter in alphabet
.
reverse
, if true, reverses the sort-order.
From the comments, this alternative form was mentioned:
In [35]: sorted(a, key=lambda word: [alphabet.index(c) for c in word[0]])
Out[35]: ['af', 'ax', 'am', 'ab', 'zvpmf']
Note that this does not sort in the correct order. That is because the key
function here only considers the first letter of each word. This can be demonstrated by testing key
:
In [2]: key=lambda word: [alphabet.index(c) for c in word[0]]
In [3]: key('af')
Out[3]: [1]
In [4]: key('ax')
Out[4]: [1]
Observe that key
returns the same value for two different strings, af
and ax
. The value returned reflects only the first character of each word. Because of this, sorted
has no way of determining that af
belongs before ax
.
Update, I misread your question, you have a list of strings, not a single string, here's how to do it, the idea is the same, use a sort based on a custom comparison function:
def acmp (a,b):
la = len(a)
lb = len(b)
lm = min(la,lb)
p = 0
while p < lm:
pa = alphabet.index(a[p])
pb = alphabet.index(b[p])
if pa > pb:
return 1
if pb > pa:
return -1
p = p + 1
if la > lb:
return 1
if lb > la:
return -1
return 0
mylist = ['baf', 'bam', 'pxm']
mylist.sort(cmp = acmp)
Instead of using index()
which requires finding the index of a char, a better alternative consists in building a hash map to be used in the sorting, in order to retrieve the index directly.
Example:
>>> alphabet = "bafmxpzv"
>>> a = ['af', 'ax', 'am', 'ab', 'zvpmf']
>>> order = dict(zip(alphabet, range(len(alphabet))))
>>> sorted(a, key=lambda word: [order[c] for c in word])
['ab', 'af', 'am', 'ax', 'zvpmf']