Hy,
I have the following code:
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
/
public class Reg
Bellow code work fine
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DuplicateRemoveEx {
public static void main(String[] args){
String regex="(?i)\\b(\\w+)(\\b\\W+\\1\\b)+";
Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while(numSentences-- >0){
String input = in.nextLine();
Matcher m = p.matcher(input);
while(m.find()){
input=input.replaceAll(regex, "$1");
}
System.out.println(input);
}
in.close();
}
}
Bellow it is your code. I have used lines to split text and Tim's regular expression.
import java.util.Scanner;
import java.io.*;
import java.util.regex.*;
import java.util.ArrayList;
/**
*
* @author Marius
*/
public class RegexSimple41 {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
ArrayList <String> manyLines = new ArrayList<String>();
ArrayList <String> noRepeat = new ArrayList<String>();
try
{
Scanner myfis = new Scanner(new File("D:\\myfis41.txt"));
while(myfis.hasNext())
{
String line = myfis.nextLine();
String delim = System.getProperty("line.separator");
String [] lines = line.split(delim);
for(String s: lines)
{
if(!s.isEmpty()&&s!=null)
{
manyLines.add(s);
}
}
}
if(!manyLines.isEmpty())
{ System.out.print("Original text\n");
for(String s: manyLines)
{
System.out.println(s);
}
}
if(!manyLines.isEmpty())
{
for(String s: manyLines)
{
String result = s.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");
noRepeat.add(result);
}
}
if(!noRepeat.isEmpty())
{ System.out.print("Remove duplicates\n");
for(String s: noRepeat)
{
System.out.println(s);
}
}
}
catch(Exception ex)
{
System.out.println(ex);
}
}
}
Good luck,
First of all, the regex [aA-zZ]*
doesn't do what you think it does. It means "Match zero or more a
s or characters in the range between ASCII A
and ASCII z
(which also includes [
, ]
, \
and others), or Z
s". It therefore also matches the empty string.
Assuming that you are only looking for duplicate words that consists solely of ASCII letters, case-insensitively, keeping the first word (which means that you wouldn't want to match "it's it's"
or "olé olé!"
), then you can do that in a single regex operation:
String result = subject.replaceAll("(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+", "$1");
which will change
Hello hello Hello there there past pastures
into
Hello there past pastures
Explanation:
(?i) # Mode: case-insensitive
\b # Match the start of a word
([a-z]+) # Match one ASCII "word", capture it in group 1
\b # Match the end of a word
(?: # Start of non-capturing group:
\s+ # Match at least one whitespace character
\1 # Match the same word as captured before (case-insensitively)
\b # and make sure it ends there.
)+ # Repeat that as often as possible
See it live on regex101.com.