Split a string to a string of valid words using Dynamic Programming

后端 未结 6 1163
不知归路
不知归路 2020-12-23 15:02

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn\'t figure it out. Here is the problem:

You are given a string of n character

相关标签:
6条回答
  • 2020-12-23 15:33

    Below is an O(n^2) solution for this problem.

    void findstringvalid() {
    string s = "itwasthebestoftimes";
    set<string> dict;
    dict.insert("it");
    dict.insert("was");
    dict.insert("the");
    dict.insert("best");
    dict.insert("of");
    dict.insert("times");
    
    vector<bool> b(s.size() + 1, false);
    vector<int> spacepos(s.size(), -1);
    //Initialization phase
    b[0] = true; //String of size 0 is always a valid string
    for (int i = 1; i <= s.size(); i++) {
        for (int j = 0; j <i; j++) {
           //string of size s[ j... i]
           if (!b[i]) {
               if (b[j]) {
                  //check if string "j to i" is in dictionary
                  string temp = s.substr(j, i - j);
                  set<string>::iterator it = dict.find(temp);
                  if (it != dict.end()) {
                      b[i] = true;
                      spacepos[i-1] = j;
                  }
               }
            }
        }
    }
    if(b[s.size()])
        for (int i = 1; i < spacepos.size(); i++) {
            if (spacepos[i] != -1) {
                string temp = s.substr(spacepos[i], i - spacepos[i] + 1);
                cout << temp << " ";
        }
        }
    }
    
    0 讨论(0)
  • 2020-12-23 15:37

    The string s[] can potentially be split into more than one ways. The method below finds the maximum number of words in which we can split s[]. Below is the sketch/pseudocode of the algorithm

    bestScore[i] -> Stores the maximum number of words in which the first i characters can be split (it would be MINUS_INFINITY otherwise)

    for (i = 1 to n){
         bestScore[i] = MINUS_INFINITY
         for (k = 1 to i-1){
            bestScore[i] = Max(bestSCore[i], bestScore[i-k]+ f(i,k))
         }
     }
    

    Where f(i,k) is defined as:

    f(i,k) = 1 : if s[i-k+1 to i] is in dictionary
           = MINUS_INFINITY : otherwise
    

    bestScore[n] would store the maximum number of words in which s[] can be split (if the value is MINUS_INFINIY, s[] cannot be split)

    Clearly the running time is O(n^2)

    As this looks like a textbook exercise, I will not write the code to reconstruct the actual split positions.

    0 讨论(0)
  • 2020-12-23 15:41

    The O(N^2) Dp is clear but if you know the words of the dictionary, i think you can use some precomputations to get it even faster in O(N). Aho-Corasick

    0 讨论(0)
  • 2020-12-23 15:42

    To formalize what @MinhPham suggested.

    This is a dynammic programming solution.

    Given a string str, let

    b[i] = true if the substring str[0...i] (inclusive) can be split into valid words.

    Prepend some starting character to str, say !, to represent the empty word. str = "!" + str

    The base case is the empty string, so

    b[0] = true.

    For the iterative case:

    b[j] = true if b[i] == true and str[i..j] is a word for all i < j

    0 讨论(0)
  • 2020-12-23 15:44

    Let the length of your compacted document be N.

    Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.

    b(N) is true (since the empty string can be split into 0 words). Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.

    Eventually, you compute b(0) which tells you if the entire document can be split into words.

    In pseudo-code:

    def try_to_split(doc):
      N = len(doc)
      b = [False] * (N + 1)
      b[N] = True
      for i in range(N - 1, -1, -1):
        for word starting at position i:
          if b[i + len(word)]:
            b[i] = True
            break
      return b
    

    There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.

    To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:

    def generate_words(doc, b, idx=0):
      length = 1
      while true:
        assert b(idx)
        if idx == len(doc): return
        word = doc[idx: idx + length]
        if word in dictionary and b(idx + length):
           output(word)
           idx += length
           length = 1
    

    Here b is the boolean array generated from the first part of the algorithm.

    0 讨论(0)
  • 2020-12-23 15:45

    A dp solution in c++:

    int main()
    {
        set<string> dict;
        dict.insert("12");
        dict.insert("123");
        dict.insert("234");
        dict.insert("12345");
        dict.insert("456");
        dict.insert("1234");
        dict.insert("567");
        dict.insert("123342");
        dict.insert("42");
        dict.insert("245436564");
        dict.insert("12334");
    
        string str = "123456712334245436564";
    
        int size = str.size();
        vector<int> dp(size+1, -1);
        dp[0] = 0;
        vector<string > res(size+1);
        for(int i = 0; i < size; ++i)
        {
            if(dp[i] != -1)
            {
                for(int j = i+1; j <= size; ++j)
                {
                    const int len = j-i;
                    string substr = str.substr(i, len);
                    if(dict.find(substr) != dict.end())
                    {
                        string space = i?" ":"";
                        res[i+len] = res[i] + space + substr;
                        dp[i+len] = dp[i]+1;
                    }
                }
            }
        }
        cout << *dp.rbegin() << endl;
        cout << *res.rbegin() << endl;
    
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题