matching end of string

前端未结

关注

 1  1680

I\'m looking for the best most efficient way to match the end of a single string with a value from a predefined list of strings.
Something like

相关标签:

1条回答

情深已故

2021-01-16 12:37
Here's a way using a trie, or prefix tree (technically a suffix tree in this situation). If we had three potential suffixes CA, CB, and BA, our suffix tree would look like
```
     e
    / \
  A     B
 / \    |
B   C   C
```
(e is the empty string) We start at the end of the input string and consume characters. If we run across the beginning of the string or a character that is not a child of the current node, then we reject the string. If we reach a leaf of the tree, then we accept the string. This lets us scale better to very many potential suffixes.
```
def build_trie(suffixes):
    head = {}
    for suffix in suffixes:
        curr = head
        for c in reversed(suffix):
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
    return head

def is_suffix(trie, s):
    if not trie:
        return True
    for c in reversed(s):
        try:
            trie = trie[c]
        except KeyError:
            return False
        if not trie:
            return True
    return False

trie = build_trie(['QWE','QQQQ','TYE','YTR','TY'])
```
gives us a trie of
```
{'E': {'W': {'Q': {}}, 
       'Y': {'T': {}}},
 'Q': {'Q': {'Q': {'Q': {}}}},
 'R': {'T': {'Y': {}}},
 'Y': {'T': {}}}
```
If you want to return the matching suffix, that's just a matter of tracking the characters we see as we descend the trie.
```
def has_suffix(trie, s):
    if not trie:
        return ''
    letters = []
    for c in reversed(s):
        try:
            trie = trie[c]
            letters.append(c)
        except KeyError:
            return None
        if not trie:
            return ''.join(letters)
    return None
```
It's worth noting that the empty trie can be reached by both build_trie(['']) and build_trie([]), and matches the empty string at the end of all strings. To avoid this, you could check the length of suffixes and return some non-dict value, which you would check against in has_suffix
0 讨论(0)
发布评论:

提交评论
- 加载中...