I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.
I came across a certain dataset which has words like
There are no English words that I know of that have more than two consecutive identical letters.
This approach would not catch:
partyy
"stoop" (plus that's ambiguous! Is that "stop" with an extra "o" or simply "stoop")