Removing consecutive duplicates words out of text using Regex and displaying the new text

后端 未结 3 1140
生来不讨喜
生来不讨喜 2021-01-03 03:33

Hy,

I have the following code:

import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;

/
public  class Reg         


        
相关标签:
3条回答
  • 2021-01-03 04:12

    Bellow code work fine

    import java.util.Scanner;

    import java.util.regex.Matcher;

    import java.util.regex.Pattern;

    public class DuplicateRemoveEx {

    public static void main(String[] args){
    
        String regex="(?i)\\b(\\w+)(\\b\\W+\\1\\b)+";
        Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);
    
        Scanner in = new Scanner(System.in);
        int numSentences = Integer.parseInt(in.nextLine());
        while(numSentences-- >0){
            String input = in.nextLine();
            Matcher m = p.matcher(input);
            while(m.find()){
                input=input.replaceAll(regex, "$1");
            }
            System.out.println(input);
        }
        in.close();
    }
    

    }

    0 讨论(0)
  • 2021-01-03 04:18

    Bellow it is your code. I have used lines to split text and Tim's regular expression.

    import java.util.Scanner;
    import java.io.*;
    import java.util.regex.*;
    import java.util.ArrayList;
    /**
     *
     * @author Marius
     */
    public class RegexSimple41 {
    
        /**
         * @param args the command line arguments
         */
        public static void main(String[] args) {
           ArrayList <String> manyLines = new ArrayList<String>();
           ArrayList <String> noRepeat = new ArrayList<String>(); 
            try
            {
                Scanner myfis = new Scanner(new File("D:\\myfis41.txt"));
    
                while(myfis.hasNext())
                {
                    String line = myfis.nextLine();
                    String delim = System.getProperty("line.separator");
                    String [] lines = line.split(delim);
    
                    for(String s: lines)
                    {
                        if(!s.isEmpty()&&s!=null)
                        {
                            manyLines.add(s);
                        }
                    }
                }
                if(!manyLines.isEmpty())
                        { System.out.print("Original text\n");
                            for(String s: manyLines)
                            {
                                System.out.println(s);
                    }
                            }
                if(!manyLines.isEmpty())
                        { 
                            for(String s: manyLines)
                            {
                                String result = s.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");
                                noRepeat.add(result);
                    }
                            }
                 if(!noRepeat.isEmpty())
                        { System.out.print("Remove duplicates\n");
                            for(String s: noRepeat)
                            {
                                System.out.println(s);
                    }
                            }
    
            }
    
            catch(Exception ex)
            {
                System.out.println(ex);
            }
        }
    
    }
    

    Good luck,

    0 讨论(0)
  • 2021-01-03 04:19

    First of all, the regex [aA-zZ]* doesn't do what you think it does. It means "Match zero or more as or characters in the range between ASCII A and ASCII z (which also includes [, ], \ and others), or Zs". It therefore also matches the empty string.

    Assuming that you are only looking for duplicate words that consists solely of ASCII letters, case-insensitively, keeping the first word (which means that you wouldn't want to match "it's it's" or "olé olé!"), then you can do that in a single regex operation:

    String result = subject.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");
    

    which will change

    Hello hello Hello there there past pastures 
    

    into

    Hello there past pastures 
    

    Explanation:

    (?i)     # Mode: case-insensitive
    \b       # Match the start of a word
    ([a-z]+) # Match one ASCII "word", capture it in group 1
    \b       # Match the end of a word
    (?:      # Start of non-capturing group:
     \s+     # Match at least one whitespace character
     \1      # Match the same word as captured before (case-insensitively)
     \b      # and make sure it ends there.
    )+       # Repeat that as often as possible
    

    See it live on regex101.com.

    0 讨论(0)
提交回复
热议问题