I\'m looking for the best most efficient way to match the end of a single string with a value from a predefined list of strings.
Something like
Here's a way using a trie, or prefix tree (technically a suffix tree in this situation). If we had three potential suffixes CA
, CB
, and BA
, our suffix tree would look like
e
/ \
A B
/ \ |
B C C
(e
is the empty string) We start at the end of the input string and consume characters. If we run across the beginning of the string or a character that is not a child of the current node, then we reject the string. If we reach a leaf of the tree, then we accept the string. This lets us scale better to very many potential suffixes.
def build_trie(suffixes):
head = {}
for suffix in suffixes:
curr = head
for c in reversed(suffix):
if c not in curr:
curr[c] = {}
curr = curr[c]
return head
def is_suffix(trie, s):
if not trie:
return True
for c in reversed(s):
try:
trie = trie[c]
except KeyError:
return False
if not trie:
return True
return False
trie = build_trie(['QWE','QQQQ','TYE','YTR','TY'])
gives us a trie of
{'E': {'W': {'Q': {}},
'Y': {'T': {}}},
'Q': {'Q': {'Q': {'Q': {}}}},
'R': {'T': {'Y': {}}},
'Y': {'T': {}}}
If you want to return the matching suffix, that's just a matter of tracking the characters we see as we descend the trie.
def has_suffix(trie, s):
if not trie:
return ''
letters = []
for c in reversed(s):
try:
trie = trie[c]
letters.append(c)
except KeyError:
return None
if not trie:
return ''.join(letters)
return None
It's worth noting that the empty trie can be reached by both build_trie([''])
and build_trie([])
, and matches the empty string at the end of all strings. To avoid this, you could check the length of suffixes
and return some non-dict value, which you would check against in has_suffix