问题
I have a String and an array of words and I have to write code to find all substrings of the string that contain all the words in the array in any order. The string does not contain any special characters / digits and each word is separated by a space.
For example:
String given:
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
Words in array:
aaaa
bbbb
cccc
Sample of output:
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb
aaaa aaaa aaaa aaaa cccc bbbb
aaaa cccc bbbb bbbb bbbb bbbb
cccc bbbb bbbb bbbb bbbb aaaa
aaaa cccc bbbb
I have implemented this using for loops, but this is very inefficient.
How can I do this more efficiently?
My code:
for(int i=0;i<str_arr.length;i++)
{
if( (str_arr.length - i) >= words.length)
{
String res = check(i);
if(!res.equals(""))
{
System.out.println(res);
System.out.println("");
}
reset_all();
}
else
{
break;
}
}
public static String check(int i)
{
String res = "";
num_words = 0;
for(int j=i;j<str_arr.length;j++)
{
if(has_word(str_arr[j]))
{
t.put(str_arr[j].toLowerCase(), 1);
h.put(str_arr[j].toLowerCase(), 1);
res = res + str_arr[j]; //+ " ";
if(all_complete())
{
return res;
}
res = res + " ";
}
else
{
res = res + str_arr[j] + " ";
}
}
res = "";
return res;
}
回答1:
My first approach would be something like the following pseudo-code
for word:string {
if word in array {
for each stored potential substring {
if word wasnt already found {
remove word from notAlreadyFoundList
if notAlreadyFoundList is empty {
use starting pos and ending pos to save our substring
}
}
store position and array-word as potential substring
}
This should have decent performance since you only traverse the string once.
[EDIT]
This is an implementation of my pseudo-code, try it out and see if it performs better or worse. It works under the assumption that a matching substring is found as soon as you find the last word. If you truly want all matches, change the lines marked //ALLMATCHES
:
class SubStringFinder {
String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));
public static void main(String[] args) {
new SubStringFinder();
}
public SubStringFinder() {
List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
for (String textPart : textString.split(" ")) {
if (words.contains(textPart)) {
for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
PotentialMatch match = matchIterator.next();
String result = match.tryMatch(textPart);
if (result != null) {
System.out.println("Match found: \"" + result + "\"");
matchIterator.remove(); //ALLMATCHES - remove this line
}
}
Set<String> unfound = new HashSet<String>(words);
unfound.remove(textPart);
matches.add(new PotentialMatch(unfound, textPart));
}// ALLMATCHES add these lines
// else {
// matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
// }
}
}
class PotentialMatch {
Set<String> unfoundWords;
StringBuilder stringPart;
public PotentialMatch(Set<String> unfoundWords, String part) {
this.unfoundWords = unfoundWords;
this.stringPart = new StringBuilder(part);
}
public String tryMatch(String part) {
this.stringPart.append(' ').append(part);
unfoundWords.remove(part);
if (unfoundWords.isEmpty()) {
return this.stringPart.toString();
}
return null;
}
}
}
回答2:
Here is another approach:
public static void main(String[] args) throws FileNotFoundException {
// init
List<String> result = new ArrayList<String>();
String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
String[] words = { "aaaa", "bbbb", "cccc" };
// find all combs as regexps (e.g. "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb")
List<String> regexps = findCombs(Arrays.asList(words));
// compile and add
for (String regexp : regexps) {
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(string);
while (m.find()) {
result.add(m.group());
}
}
System.out.println(result);
}
private static List<String> findCombs(List<String> words) {
if (words.size() == 1) {
words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0)));
return words;
}
List<String> list = new ArrayList<String>();
for (String word : words) {
List<String> tail = new LinkedList<String>(words);
tail.remove(word);
for (String s : findCombs(tail)) {
list.add("(" + Pattern.quote(word) + " ?)+" + s);
}
}
return list;
}
This will output:
[aaaa bbbb cccc, aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb, cccc bbbb bbbb bbbb bbbb aaaa]
I know the result is not complete: you got only the available combinaisons, fully extended, but you got all of them.
来源:https://stackoverflow.com/questions/11224034/finding-sub-strings-of-string-containing-all-the-words-in-array