I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.
I came across a certain dataset which has words like
You can use regex to find letters that have same letter after it at least two times (since we don't want to remove correct letters like m
in comma
)
String data="stoooooop partyyyyyy";
System.out.println(data.replaceAll("([a-zA-Z])\\1{2,}", "$1"));
// | | |
// group 1 match replace with
// from match from group 1
// group 1
// repeated
// twice or more
Output:
stop party