Removing consecutive duplicates words out of text using Regex and displaying the new text

后端未结

关注

 3  1140

Hy,

I have the following code:

import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;

/
public  class Reg


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2021-01-03 04:12
              
            
            
                                                                       
Bellow code work fine

import java.util.Scanner;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class DuplicateRemoveEx {

public static void main(String[] args){

    String regex="(?i)\\b(\\w+)(\\b\\W+\\1\\b)+";
    Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);

    Scanner in = new Scanner(System.in);
    int numSentences = Integer.parseInt(in.nextLine());
    while(numSentences-- >0){
        String input = in.nextLine();
        Matcher m = p.matcher(input);
        while(m.find()){
            input=input.replaceAll(regex, "$1");
        }
        System.out.println(input);
    }
    in.close();
}


}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-01-03 04:18
              
            
            
                                                                       
Bellow it is your code. I have used lines to split text and Tim's regular expression.

import java.util.Scanner;
import java.io.*;
import java.util.regex.*;
import java.util.ArrayList;
/**
 *
 * @author Marius
 */
public class RegexSimple41 {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
       ArrayList <String> manyLines = new ArrayList<String>();
       ArrayList <String> noRepeat = new ArrayList<String>(); 
        try
        {
            Scanner myfis = new Scanner(new File("D:\\myfis41.txt"));

            while(myfis.hasNext())
            {
                String line = myfis.nextLine();
                String delim = System.getProperty("line.separator");
                String [] lines = line.split(delim);

                for(String s: lines)
                {
                    if(!s.isEmpty()&&s!=null)
                    {
                        manyLines.add(s);
                    }
                }
            }
            if(!manyLines.isEmpty())
                    { System.out.print("Original text\n");
                        for(String s: manyLines)
                        {
                            System.out.println(s);
                }
                        }
            if(!manyLines.isEmpty())
                    { 
                        for(String s: manyLines)
                        {
                            String result = s.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");
                            noRepeat.add(result);
                }
                        }
             if(!noRepeat.isEmpty())
                    { System.out.print("Remove duplicates\n");
                        for(String s: noRepeat)
                        {
                            System.out.println(s);
                }
                        }

        }

        catch(Exception ex)
        {
            System.out.println(ex);
        }
    }

}


Good luck,
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  故里飘歌        
                
              
                            
                2021-01-03 04:19
              
            
            
                                                                       
First of all, the regex [aA-zZ]* doesn't do what you think it does. It means "Match zero or more as or characters in the range between ASCII A and ASCII z (which also includes [, ], \ and others), or Zs". It therefore also matches the empty string. 

Assuming that you are only looking for duplicate words that consists solely of ASCII letters, case-insensitively, keeping the first word (which means that you wouldn't want to match "it's it's" or "olé olé!"), then you can do that in a single regex operation:

String result = subject.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");


which will change

Hello hello Hello there there past pastures 


into

Hello there past pastures 


Explanation:

(?i)     # Mode: case-insensitive
\b       # Match the start of a word
([a-z]+) # Match one ASCII "word", capture it in group 1
\b       # Match the end of a word
(?:      # Start of non-capturing group:
 \s+     # Match at least one whitespace character
 \1      # Match the same word as captured before (case-insensitively)
 \b      # and make sure it ends there.
)+       # Repeat that as often as possible


See it live on regex101.com.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复