How to Check for Deleted Words Between 2 Sentences in Java

烂漫一生 提交于 2020-01-06 19:35:31

问题


What's the best approach in Java if you want to check for words that were deleted from sentence A in sentence B. For example:

Sentence A: I want to delete unnecessary words on this simple sentence.

Sentence B: I want to delete words on this sentence.

Output: I want to delete (unnecessary) words on this (simple) sentence.

where the words inside the parenthesis are the ones that were deleted from sentence A.


回答1:


Assuming order doesn't matter: use commons-collections.

  1. Use String.split() to split both sentences into arrays of words.
  2. Use commons-collections' CollectionUtils.addAll to add each array into an empty Set.
  3. Use commons-collections' CollectionUtils.subtract method to get A-B.



回答2:


Assuming order and position matters, this looks like it would be a variation of the Longest Common Subsequence problem, a dynamic programming solution.

wikipedia has a great page on the topic, there's really too much for me to outline here

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem




回答3:


Everyone else is using really heavy-weight algorithms for what is actually a very simple problem. It could be solved using longest common subsequence, but it's a very constrained version of that. It's not a full diff; it only includes deletes. No need for dynamic programming or anything like that. Here's a 20-line implementation:

private static String deletedWords(String s1, String s2) {
    StringBuilder sb = new StringBuilder();
    String[] words1 = s1.split("\\s+");
    String[] words2 = s2.split("\\s+");
    int i1, i2;
    i1 = i2 = 0;
    while (i1 < words1.length) {
        if (words1[i1].equals(words2[i2])) {
            sb.append(words1[i1]);
            i2++;
        } else {
            sb.append("(" + words1[i1] + ")");
        }
        if (i1 < words1.length - 1) {
            sb.append(" ");
        }
        i1++;
    }
    return sb.toString();
}

When the inputs are the ones in the question, the output matches exactly.

Granted, I understand that for some inputs there are multiple solutions. For example:

a b a
a

could be either a (b) (a) or (a) (b) a and maybe for some versions of this problem, one of these solutions is more likely to be the "actual" solution than the other, and for those you need some recursive or dynamic programming approach... but let's not make it too much more complicated than what Israel Sato originally asked for!




回答4:


String a = "I want to delete unnecessary words on this simple sentence.";
String b = "I want to delete words on this sentence.";

String[] aWords = a.split(" ");
String[] bWords = b.split(" ");
List<String> missingWords = new ArrayList<String> ();

int x = 0;
for(int i = 0 ; i < aWords.length; i++) {
  String aWord = aWords[i];
  if(x < bWords.length) {
    String bWord = bWords[x];
    if(aWord.equals(bWord)) {
        x++;
    } else {
        missingWords.add(aWord);
    }
   } else {
      missingWords.add(aWord);
   }
}



回答5:


This works well....for updated strings also
updated strings enclosed with square brackets.

import java.util.*;
class Sample{
public static void main(String[] args){
    Scanner sc=new Scanner(System.in);  

    String str1 = sc.nextLine();
    String str2 = sc.nextLine();
    List<String> flist = Arrays.asList(str1.split("\\s+"));
    List<String> slist = Arrays.asList(str2.split("\\s+"));
    List<String> completedString = new ArrayList<String>();
    String result="";
    String updatedString = "";
    String deletedString = "";
    int i=0;
    int startIndex=0;
    int endIndex=0;
    for(String word: slist){
        if(flist.contains(word)){
            endIndex = flist.indexOf(word);
            if(!completedString.contains(word)){
                if(deletedString.isEmpty()){
                    for(int j=startIndex;j<endIndex;j++){
                        deletedString+= flist.get(j)+" ";
                    }
                }
            }
            startIndex=endIndex+1;
            if(!deletedString.isEmpty()){
                result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
                deletedString="";
            }
            if(!updatedString.isEmpty()){
                result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
                updatedString="";
            }
            result += word+" ";
            completedString.add(word);
            if(i==slist.size()-1){
                endIndex = flist.size();
                for(int j=startIndex;j<endIndex;j++){
                    deletedString+= flist.get(j)+" ";
                }
                startIndex = endIndex+1;
            }
        }
        else{
            if(i == 0){
                boolean boundaryCheck = false;
                for(int j=i+1;j<slist.size();j++){
                    if(flist.contains(slist.get(j))){
                        endIndex=flist.indexOf(slist.get(j));
                        boundaryCheck=true;
                        break;
                    }
                }
                if(!boundaryCheck){
                    endIndex = flist.size();
                }
                if(!completedString.contains(word)){
                    for(int j=startIndex;j<endIndex;j++){
                        deletedString+= flist.get(j)+" ";
                    }
                }
                startIndex = endIndex+1;
            }else if(i == slist.size()-1){
                endIndex = flist.size();
                if(!completedString.contains(word)){
                    for(int j=startIndex;j<endIndex;j++){
                        deletedString+= flist.get(j)+" ";
                    }
                }
                startIndex = endIndex+1;
            }               
            updatedString += word+" ";
            completedString.add(word);
        }
        i++;
    }
    if(!deletedString.isEmpty()){
        result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
    }
    if(!updatedString.isEmpty()){
        result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
    }
    System.out.println(result);
}

}




回答6:


This is basically a differ, take a look at this:

  • diff

and the root algorithm:

  • Longest common subsequence problem

Here's a sample Java implementation:

  • http://introcs.cs.princeton.edu/java/96optimization/Diff.java.html

which compares lines. The only thing you need to do is split by word instead of by line or alternatively put each word of both sentences in a separate line.

If e.g. on Linux, you can actually see the results of the latter option using diff program itself before you even write any code, try this:

$ echo "I want to delete unnecessary words on this simple sentence."|tr " " "\n" > 1
$ echo "I want to delete words on this sentence."|tr " " "\n" > 2
$ diff -uN 1 2
--- 1   2012-10-01 19:40:51.998853057 -0400
+++ 2   2012-10-01 19:40:51.998853057 -0400
@@ -2,9 +2,7 @@
 want
 to
 delete
-unnecessary
 words
 on
 this
-simple
 sentence.

The lines with - in front are different (alternatively, it would show + if the lines were added into sentence B that were not in sentence A). Try it out to see if that fits your problem.

Hope this helps.



来源:https://stackoverflow.com/questions/12682510/how-to-check-for-deleted-words-between-2-sentences-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!