Replace multiple consecutive occurrences of a character with a single occurrence

后端 未结 5 544
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 14:47

I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.

I came across a certain dataset which has words like

5条回答
  •  抹茶落季
    2021-01-07 15:30

    You may wish to use \p{L}\p{M}* instead of [a-zA-Z] to include non-English unicode letters as well. So it will be like this: replaceAll("(\\p{L}\\p{M}*)(\\1{" + maxAllowedRepetition + ",})", "$1"); or this: replaceAll("(\\p{L}\\p{M}*)\\1{" + maxAllowedRepetition + ",}", "$1");

提交回复
热议问题