Replace multiple consecutive occurrences of a character with a single occurrence

后端 未结 5 545
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 14:47

I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.

I came across a certain dataset which has words like

5条回答
  •  醉梦人生
    2021-01-07 15:18

    There are no English words that I know of that have more than two consecutive identical letters.

    1. Iterate over all words
    2. If the word has more than two consecutive identical letters, then:
      • Remove all but two of the duplicate letters, and see if a valid word is formed.
      • Otherwise, remove all but one duplicate letter, and see if a valid word is formed.
      • Otherwise, fail.

    This approach would not catch:

    • partyy

    • "stoop" (plus that's ambiguous! Is that "stop" with an extra "o" or simply "stoop")

提交回复
热议问题