You may wish to use \p{L}\p{M}* instead of [a-zA-Z] to include non-English unicode letters as well. So it will be like this: replaceAll("(\\p{L}\\p{M}*)(\\1{" + maxAllowedRepetition + ",})", "$1"); or this: replaceAll("(\\p{L}\\p{M}*)\\1{" + maxAllowedRepetition + ",}", "$1");