Finding Sub-Strings of String Containing all the words in array

老子叫甜甜 提交于 2019-12-05 04:06:54

问题


I have a String and an array of words and I have to write code to find all substrings of the string that contain all the words in the array in any order. The string does not contain any special characters / digits and each word is separated by a space.

For example:

String given:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc

Words in array:

aaaa
bbbb
cccc

Sample of output:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb

I have implemented this using for loops, but this is very inefficient.

How can I do this more efficiently?

My code:

    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}

回答1:


My first approach would be something like the following pseudo-code

  for word:string {
    if word in array {
      for each stored potential substring {
        if word wasnt already found {
          remove word from notAlreadyFoundList
          if notAlreadyFoundList is empty {
            use starting pos and ending pos to save our substring
          }
        }
      store position and array-word as potential substring
  }

This should have decent performance since you only traverse the string once.

[EDIT]

This is an implementation of my pseudo-code, try it out and see if it performs better or worse. It works under the assumption that a matching substring is found as soon as you find the last word. If you truly want all matches, change the lines marked //ALLMATCHES:

class SubStringFinder {
    String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));

    public static void main(String[] args) {
        new SubStringFinder();
    }

    public SubStringFinder() {
        List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
        for (String textPart : textString.split(" ")) {
            if (words.contains(textPart)) {
                for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
                    PotentialMatch match = matchIterator.next();
                    String result = match.tryMatch(textPart);
                    if (result != null) {
                        System.out.println("Match found: \"" + result + "\"");
                        matchIterator.remove(); //ALLMATCHES - remove this line
                    }
                }
                Set<String> unfound = new HashSet<String>(words);
                unfound.remove(textPart);
                matches.add(new PotentialMatch(unfound, textPart));
            }// ALLMATCHES add these lines 
             // else {
             // matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
             // }
        }
    }

    class PotentialMatch {
        Set<String> unfoundWords;
        StringBuilder stringPart;
        public PotentialMatch(Set<String> unfoundWords, String part) {
            this.unfoundWords = unfoundWords;
            this.stringPart = new StringBuilder(part);
        }
        public String tryMatch(String part) {
            this.stringPart.append(' ').append(part);
            unfoundWords.remove(part);                
            if (unfoundWords.isEmpty()) {
                return this.stringPart.toString();
            }
            return null;
        }
    }
}



回答2:


Here is another approach:

public static void main(String[] args) throws FileNotFoundException {
    // init
    List<String> result = new ArrayList<String>();
    String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    String[] words = { "aaaa", "bbbb", "cccc" };
    // find all combs as regexps (e.g. "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb")
    List<String> regexps = findCombs(Arrays.asList(words));
    // compile and add
    for (String regexp : regexps) {
        Pattern p = Pattern.compile(regexp);
        Matcher m = p.matcher(string);
        while (m.find()) {
            result.add(m.group());
        }
    }
    System.out.println(result);
}

private static List<String> findCombs(List<String> words) {
    if (words.size() == 1) {
        words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0)));
        return words;
    }
    List<String> list = new ArrayList<String>();
    for (String word : words) {
        List<String> tail = new LinkedList<String>(words);
        tail.remove(word);
        for (String s : findCombs(tail)) {
            list.add("(" + Pattern.quote(word) + " ?)+" + s);
        }
    }
    return list;
}

This will output:

[aaaa bbbb cccc, aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb, cccc bbbb bbbb bbbb bbbb aaaa]

I know the result is not complete: you got only the available combinaisons, fully extended, but you got all of them.



来源:https://stackoverflow.com/questions/11224034/finding-sub-strings-of-string-containing-all-the-words-in-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!