Split string into words

后端 未结 4 1107
予麋鹿
予麋鹿 2021-02-03 13:27

I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:

Input String: forevercarrot

Output:

foreve         


        
相关标签:
4条回答
  • 2021-02-03 14:06

    Input String: forevercarrot

    Output:

    forever carrot forever car rot for ever carrot for ever car rot

    program :

    #include<iostream>
    #include<string>
    #include<vector>
    #include<string.h>
    void strsplit(std::string str)
    {
       int len=0,i,x,y,j,k;
       len = str.size();
       std::string s1,s2,s3,s4,s5,s6,s7;
       char *c = new char[len+1]();
       char *b = new char[len+1]();
       char *d = new char[len+1]();
       for(i =0 ;i< len-1;i++)
       {
           std::cout<<"\n";
           for(j=0;j<=i;j++)
           {
              c[j] = str[j];
              b[j] = str[j];
              s3 += c[j];
              y = j+1;
           }
           for( int h=i+1;h<len;h++){
              s5 += str[h];
           }
           s6 = s3+" "+s5;
           std::cout<<" "<<s6<<"\n";
           s5 = "";
           for(k = y;k<len-1;k++)
           {
              d[k] = str[k];
              s1 += d[k];
              s1 +=  " ";
              for(int l = k+1;l<len;l++){
               b[l] = str[l];
               s2 += b[l];
             }
           s4 = s3+" "+s1+s2;
           s7 = s4;
           std::cout<<" "<<s4<<"\n";
           s3 = "";s4 = "";
           }
           s1 = "";s3 = "";
        }
    }
    
    int main(int argc, char* argv[])
    {
        std::string str;
        if(argc < 2)
                    std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
        else{
                    str = argv[1];
                    strsplit(str);
        }
    
    return 0;
    }
    
    0 讨论(0)
  • 2021-02-03 14:12

    See this question which has even better answers. It's a standard dynamic programming problem:

    How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

    0 讨论(0)
  • 2021-02-03 14:12

    A psuedocode implementation, exploiting the fact that every part of the string needs to be a word, we can't skip anything. We work forward from the start of the string until the first bit is a word, and then generate all possible combinations of the rest of the string. Once we've done that, we keep going along until we find any other possibilities for the first word, and so on.

    allPossibleWords(string s, int startPosition) {
        list ret
        for i in startPosition..s'length
            if isWord(s[startPosition, i])
                ret += s[startPostion, i] * allPossibleWords(s, i)
        return ret    
    }
    

    The bugbear in this code is that you'll end up repeating calculations - in your example, you'll end up having to calculate allPossibleWords("carrot") twice - once in ["forever", allPossibleWords["carrot"]] and once in ["for", "ever", allPossibleWords["carrot"]]. So memoizing this is something to consider.

    0 讨论(0)
  • 2021-02-03 14:25

    Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.

    Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').

    Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).

    0 讨论(0)
提交回复
热议问题