I have some list of strings, for example:
["foo bar SOME baz TEXT bob",
"SOME foo bar baz bob TEXT",
"SOME foo TEXT",
"
See your friendly neighborhood sorting tutorial. You'll need a sort with a key. Here's a trivial function to give you the idea; it finds the distance between the two words, returning that as the difference metric.
sentence = ["foo bar SOME baz TEXT bob",
"SOME foo bar baz bob TEXT",
"SOME foo TEXT",
"foo bar SOME TEXT baz",
"SOME TEXT"]
def match_score(sentence):
some_pos = sentence.find("SOME")
text_pos = sentence.find("TEXT")
return abs(text_pos - some_pos)
sentence.sort(key = lambda x: match_score(x))
for item in sentence:
print(item)
Output:
foo bar SOME TEXT baz
SOME TEXT
foo bar SOME baz TEXT bob
SOME foo TEXT
SOME foo bar baz bob TEXT
Here is my take on it.
l = ["foo bar SOME baz TEXT bob",
"SOME foo bar baz bob TEXT",
"SOME foo TEXT",
"foo bar SOME TEXT baz",
"SOME TEXT"]
l.sort(key=lambda x: (x.find("SOME")-x.find("TEXT"))*0.9-0.1*x.find("SOME"), reverse=True)
print(l)
OUTPUT:
['SOME TEXT', 'foo bar SOME TEXT baz', 'SOME foo TEXT', 'foo bar SOME baz TEXT bob', 'SOME foo bar baz bob TEXT']
So what we have done is sorted the list based on major weight to the distance between "SOME" and "TEXT" and some minor weight to the occurrence of "SOME" in the string.
Another longer way would be to first group the list based on the their distance between SOME and TEXT. And then sort the each group based on the position of "SOME".
You can use difflib.SequenceMatcher, to achieve something very similar to your desired output:
>>> import difflib
>>> l = ["foo bar SOME baz TEXT bob", "SOME foo bar baz bob TEXT", "SOME foo TEXT", "foo bar SOME TEXT baz", "SOME TEXT"]
>>> sorted(l, key=lambda z: difflib.SequenceMatcher(None, z, "SOME TEXT").ratio(), reverse=True)
['SOME TEXT', 'SOME foo TEXT', 'foo bar SOME TEXT baz', 'foo bar SOME baz TEXT bob', 'SOME foo bar baz bob TEXT']
If you can't tell the only difference is that the position of the two elements "foo bar SOME TEXT baz"
and "SOME foo TEXT"
are swapped compared to your desired output.