Algorithm for checking if a string was built from a list of substrings

前端 未结 10 1974
醉酒成梦
醉酒成梦 2021-02-02 14:35

You are given a string and an array of strings. How to quickly check, if this string can be built by concatenating some of the strings in the array?

This is a theoretica

相关标签:
10条回答
  • 2021-02-02 15:07

    It's definitely not quick but you here's an idea:

    • Iterate over all the strings, checking if the target string "begins" with any of them
    • Take the longest string with which the target string begins, remove it from the list and trim it from the main string
    • Rinse, repeat

    Stop when you're left with a 0 length target string.

    As I said before, this is definitely not fast but should give you a baseline ("it shouldn't get much worse than this").

    EDIT

    As pointed out in the comments, this will not work. You will have to store the partial matches and fall back on them when you find there is no way further.

    • When you find that a string is the head of the target, push it onto a list. After building the list, you will naturally try the biggest "head" of the target
    • When you find that the head you tried doesn't fit with what's left, try the next best head

    This way, eventually you'll explore the entire space of solutions. For every candidate head you'll try every possible tail.

    0 讨论(0)
  • 2021-02-02 15:08

    Note: I assume here that you can use each substring more than once. You can generalize the solution to include this restriction by changing how we define subproblems. That will have a negative impact on space as well as expected runtime, but the problem remains polynomial.

    This is a dynamic programming problem. (And a great question!)

    Let's define composable(S, W) to be true if the string S can be written using the list of substrings W.

    S is composable if and only if:

    1. S starts with a substring w in W.
    2. The remainder of S after w is also composable.

    Let's write some pseudocode:

    COMPOSABLE(S, W):
      return TRUE if S = "" # Base case
      return memo[S] if memo[S]
    
      memo[S] = false
    
      for w in W:
        length <- LENGTH(w)
        start  <- S[1..length]
        rest   <- S[length+1..-1]
        if start = w AND COMPOSABLE(rest, W) :
          memo[S] = true # Memoize
    
      return memo[S]
    

    This algorithm has O(m*n) runtime, assuming the length of the substrings is not linear w/r/t to the string itself, in which case runtime would be O(m*n^2) (where m is the size of the substring list and n is the length of the string in question). It uses O(n) space for memoization.

    (N.B. as written the pseudocode uses O(n^2) space, but hashing the memoization keys would alleviate this.)

    EDIT

    Here is a working Ruby implementation:

    def composable(str, words)
      composable_aux(str, words, {})
    end
    
    def composable_aux(str, words, memo)
      return true if str == ""                # The base case
      return memo[str] unless memo[str].nil?  # Return the answer if we already know it
    
      memo[str] = false              # Assume the answer is `false`
    
      words.each do |word|           # For each word in the list:
        length = word.length
        start  = str[0..length-1]
        rest   = str[length..-1]
    
        # If the test string starts with this word,
        # and the remaining part of the test string
        # is also composable, the answer is true.
        if start == word and composable_aux(rest, words, memo)
          memo[str] = true           # Mark the answer as true
        end
      end
    
      memo[str]                      # Return the answer
    end
    
    0 讨论(0)
  • 2021-02-02 15:08

    two options sprint to mind but neither of them seem very elegant.

    1) brute force: do it like you would a password generator i.e. word1+word1+word1 > word1+word1+word2 > word1+word1+word3 etc etc etc

    the trick there is the length so youd have to try all combinations of 2 or more words and you don't know where to set the limit. Very time consuming.

    2) take the string in question and run a find in on it for every word you have 1 at a time. maybe check the length and if its greater than 0 do it again. keep doing it till you hit zero it cant find any more results. if you hit 0 its a win if not its a lose. I think this method would be a lot better than the first but I imagine someone will have a better suggestion.

    0 讨论(0)
  • 2021-02-02 15:08

    If each substring must be used only once but not all of them must be used...

    For each permutation of size N from the substrings that is equal in size to the original string check it, if none, do a permutation of N+1 items, end so forth, until you exhaust all the permutations.

    Of course NP complete, slow as hell, but i think that no normal solutions exist.

    To explain why the solutions where removing substrings from the original string won't ever work:

    Have a string "1234123" and array "12","34","123". If you remove "123" from the start, you have a false negative. A similar example where removing from the end would be: "1234123" : "23,"41","123".

    With backtracking with greedy: (m string length 7, n num elements 3) - take the longest: 123 - remove it from first occurence O(3) - try other two with the rest: no go + O((n-1)*(m-3)) - backtrack O(1) - remove from second: O(m-3) - try other two O((n-1)*m-3) = O(30)

    Permutations of 1 + 2 + 3 = O(3) + O(4) + O(6) = O(13). So for small subset lenght permutations are actually faster than backtracking. This will change if you ask for a lot of substrings to find (in most cases but not all).

    You can remove only the nonexisting substrings from the array to lower the number of permutations from n^n to n^(n-1) for each removed nonexisting substring.

    0 讨论(0)
提交回复
热议问题