Replace multiple consecutive occurrences of a character with a single occurrence

后端 未结 5 547
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 14:47

I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.

I came across a certain dataset which has words like

5条回答
  •  被撕碎了的回忆
    2021-01-07 15:22

    You can use regex to find letters that have same letter after it at least two times (since we don't want to remove correct letters like m in comma)

    String data="stoooooop partyyyyyy";
    System.out.println(data.replaceAll("([a-zA-Z])\\1{2,}", "$1"));
    //                                       |      |         |
    //                                   group 1   match    replace with 
    //                                             from     match from group 1
    //                                             group 1
    //                                             repeated 
    //                                           twice or more
    

    Output:

    stop party
    

提交回复
热议问题