问题
What's the best approach in Java if you want to check for words that were deleted from sentence A in sentence B. For example:
Sentence A: I want to delete unnecessary words on this simple sentence.
Sentence B: I want to delete words on this sentence.
Output: I want to delete (unnecessary) words on this (simple) sentence.
where the words inside the parenthesis are the ones that were deleted from sentence A.
回答1:
Assuming order doesn't matter: use commons-collections.
- Use
String.split()
to split both sentences into arrays of words. - Use commons-collections'
CollectionUtils.addAll
to add each array into an emptySet
. - Use commons-collections'
CollectionUtils.subtract
method to get A-B.
回答2:
Assuming order and position matters, this looks like it would be a variation of the Longest Common Subsequence problem, a dynamic programming solution.
wikipedia has a great page on the topic, there's really too much for me to outline here
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
回答3:
Everyone else is using really heavy-weight algorithms for what is actually a very simple problem. It could be solved using longest common subsequence, but it's a very constrained version of that. It's not a full diff; it only includes deletes. No need for dynamic programming or anything like that. Here's a 20-line implementation:
private static String deletedWords(String s1, String s2) {
StringBuilder sb = new StringBuilder();
String[] words1 = s1.split("\\s+");
String[] words2 = s2.split("\\s+");
int i1, i2;
i1 = i2 = 0;
while (i1 < words1.length) {
if (words1[i1].equals(words2[i2])) {
sb.append(words1[i1]);
i2++;
} else {
sb.append("(" + words1[i1] + ")");
}
if (i1 < words1.length - 1) {
sb.append(" ");
}
i1++;
}
return sb.toString();
}
When the inputs are the ones in the question, the output matches exactly.
Granted, I understand that for some inputs there are multiple solutions. For example:
a b a
a
could be either a (b) (a)
or (a) (b) a
and maybe for some versions of this problem, one of these solutions is more likely to be the "actual" solution than the other, and for those you need some recursive or dynamic programming approach... but let's not make it too much more complicated than what Israel Sato originally asked for!
回答4:
String a = "I want to delete unnecessary words on this simple sentence.";
String b = "I want to delete words on this sentence.";
String[] aWords = a.split(" ");
String[] bWords = b.split(" ");
List<String> missingWords = new ArrayList<String> ();
int x = 0;
for(int i = 0 ; i < aWords.length; i++) {
String aWord = aWords[i];
if(x < bWords.length) {
String bWord = bWords[x];
if(aWord.equals(bWord)) {
x++;
} else {
missingWords.add(aWord);
}
} else {
missingWords.add(aWord);
}
}
回答5:
This works well....for updated strings also
updated strings enclosed with square brackets.
import java.util.*;
class Sample{
public static void main(String[] args){
Scanner sc=new Scanner(System.in);
String str1 = sc.nextLine();
String str2 = sc.nextLine();
List<String> flist = Arrays.asList(str1.split("\\s+"));
List<String> slist = Arrays.asList(str2.split("\\s+"));
List<String> completedString = new ArrayList<String>();
String result="";
String updatedString = "";
String deletedString = "";
int i=0;
int startIndex=0;
int endIndex=0;
for(String word: slist){
if(flist.contains(word)){
endIndex = flist.indexOf(word);
if(!completedString.contains(word)){
if(deletedString.isEmpty()){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
}
startIndex=endIndex+1;
if(!deletedString.isEmpty()){
result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
deletedString="";
}
if(!updatedString.isEmpty()){
result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
updatedString="";
}
result += word+" ";
completedString.add(word);
if(i==slist.size()-1){
endIndex = flist.size();
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
startIndex = endIndex+1;
}
}
else{
if(i == 0){
boolean boundaryCheck = false;
for(int j=i+1;j<slist.size();j++){
if(flist.contains(slist.get(j))){
endIndex=flist.indexOf(slist.get(j));
boundaryCheck=true;
break;
}
}
if(!boundaryCheck){
endIndex = flist.size();
}
if(!completedString.contains(word)){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
startIndex = endIndex+1;
}else if(i == slist.size()-1){
endIndex = flist.size();
if(!completedString.contains(word)){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
startIndex = endIndex+1;
}
updatedString += word+" ";
completedString.add(word);
}
i++;
}
if(!deletedString.isEmpty()){
result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
}
if(!updatedString.isEmpty()){
result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
}
System.out.println(result);
}
}
回答6:
This is basically a differ, take a look at this:
- diff
and the root algorithm:
- Longest common subsequence problem
Here's a sample Java implementation:
- http://introcs.cs.princeton.edu/java/96optimization/Diff.java.html
which compares lines. The only thing you need to do is split by word instead of by line or alternatively put each word of both sentences in a separate line.
If e.g. on Linux, you can actually see the results of the latter option using diff
program itself before you even write any code, try this:
$ echo "I want to delete unnecessary words on this simple sentence."|tr " " "\n" > 1
$ echo "I want to delete words on this sentence."|tr " " "\n" > 2
$ diff -uN 1 2
--- 1 2012-10-01 19:40:51.998853057 -0400
+++ 2 2012-10-01 19:40:51.998853057 -0400
@@ -2,9 +2,7 @@
want
to
delete
-unnecessary
words
on
this
-simple
sentence.
The lines with -
in front are different (alternatively, it would show +
if the lines were added into sentence B that were not in sentence A). Try it out to see if that fits your problem.
Hope this helps.
来源:https://stackoverflow.com/questions/12682510/how-to-check-for-deleted-words-between-2-sentences-in-java