matching end of string

前端 未结 1 1680
情深已故
情深已故 2021-01-16 12:24

I\'m looking for the best most efficient way to match the end of a single string with a value from a predefined list of strings.
Something like

相关标签:
1条回答
  • 2021-01-16 12:37

    Here's a way using a trie, or prefix tree (technically a suffix tree in this situation). If we had three potential suffixes CA, CB, and BA, our suffix tree would look like

         e
        / \
      A     B
     / \    |
    B   C   C
    

    (e is the empty string) We start at the end of the input string and consume characters. If we run across the beginning of the string or a character that is not a child of the current node, then we reject the string. If we reach a leaf of the tree, then we accept the string. This lets us scale better to very many potential suffixes.

    def build_trie(suffixes):
        head = {}
        for suffix in suffixes:
            curr = head
            for c in reversed(suffix):
                if c not in curr:
                    curr[c] = {}
                curr = curr[c]
        return head
    
    def is_suffix(trie, s):
        if not trie:
            return True
        for c in reversed(s):
            try:
                trie = trie[c]
            except KeyError:
                return False
            if not trie:
                return True
        return False
    
    trie = build_trie(['QWE','QQQQ','TYE','YTR','TY'])
    

    gives us a trie of

    {'E': {'W': {'Q': {}}, 
           'Y': {'T': {}}},
     'Q': {'Q': {'Q': {'Q': {}}}},
     'R': {'T': {'Y': {}}},
     'Y': {'T': {}}}
    

    If you want to return the matching suffix, that's just a matter of tracking the characters we see as we descend the trie.

    def has_suffix(trie, s):
        if not trie:
            return ''
        letters = []
        for c in reversed(s):
            try:
                trie = trie[c]
                letters.append(c)
            except KeyError:
                return None
            if not trie:
                return ''.join(letters)
        return None
    

    It's worth noting that the empty trie can be reached by both build_trie(['']) and build_trie([]), and matches the empty string at the end of all strings. To avoid this, you could check the length of suffixes and return some non-dict value, which you would check against in has_suffix

    0 讨论(0)
提交回复
热议问题