Search for strings matching the pattern “abc:*:xyz” in less than O(n)

前端 未结 4 1011
有刺的猬
有刺的猬 2021-01-07 10:45

Given a bunch of strings I need to find those which match 3 kinds of patterns:

  • Prefix search - abc*
  • Glob-like pattern - abc:*:xyz
  • Suffix sear
4条回答
  •  离开以前
    2021-01-07 11:33

    If "abc" and "xyz" are fixed values, you can maintain three counters with your collection indicating the number of strings:

    • starting with "abc" but not ending with "xyz".
    • not starting with "abc" but ending with "xyz".
    • starting with "abc" and ending with "xyz".

    That gives you an O(1) time complexity for searching at the cost of extra calculation when inserting into, or deleting from, the collection.

    If the "abc" and "xyz" are arbitrary strings, it's O(n) for all operations, including the "abc..." one. You only have to consider what happens when your collections consists of items that all start with "abc" to see this. That's not bounded by O(logN) at all since you have to process all items in the tree (both branches of every non-leaf node).

    I think your ideal solution is to maintain the two ordered trees, one for the normal strings and one for the reversed strings. But don't worry about trying to do an intersection between the two. All you need to do is minimize the search space as much as practicable.

    • To find "abc...", use the normal tree to find the strings starting with that value.
    • To find "...xyz", use the reverse tree to find the strings ending with the reverse of that that value (zyx...).
    • To find "abc...xyz", use the normal tree to find the strings starting with that value and then filter out those that don't end in "xyz".

    That way you don't have to worry about intersecting values between the two trees and you still get a performance improvement over the simplistic linear search.

提交回复
热议问题