Replace multiple consecutive occurrences of a character with a single occurrence

后端未结

关注

 5  550

被撕碎了的回忆

I am making a natural language language processing application in Java, I am using data from IMDB and Amazon.

I came across a certain dataset which has words like

相关标签:

5条回答

醉梦人生

2021-01-07 15:18
There are no English words that I know of that have more than two consecutive identical letters.
1. Iterate over all words
2. If the word has more than two consecutive identical letters, then:
  - Remove all but two of the duplicate letters, and see if a valid word is formed.
  - Otherwise, remove all but one duplicate letter, and see if a valid word is formed.
  - Otherwise, fail.
This approach would not catch:
- partyy
- "stoop" (plus that's ambiguous! Is that "stop" with an extra "o" or simply "stoop")
0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2021-01-07 15:22

You can use regex to find letters that have same letter after it at least two times (since we don't want to remove correct letters like m in comma)

String data="stoooooop partyyyyyy";
System.out.println(data.replaceAll("([a-zA-Z])\\1{2,}", "$1"));
//                                       |      |         |
//                                   group 1   match    replace with 
//                                             from     match from group 1
//                                             group 1
//                                             repeated 
//                                           twice or more

Output:

stop party

0 讨论(0)

没有蜡笔的小新

2021-01-07 15:28

You can use this snippet its quite fast implementation.

public static String removeConsecutiveChars(String str) {

        if (str == null) {
            return null;
        }

        int strLen = str.length();
        if (strLen <= 1) {
            return str;
        }

        char[] strChar = str.toCharArray();
        char temp = strChar[0];

        StringBuilder stringBuilder = new StringBuilder(strLen);
        for (int i = 1; i < strLen; i++) {

            char val = strChar[i];
            if (val != temp) {
                stringBuilder.append(temp);
                temp = val;
            }
        }
        stringBuilder.append(temp);

        return stringBuilder.toString();
    }

0 讨论(0)

心在旅途

2021-01-07 15:28

Try using loop,

 String word="Stoooppppd";
    StringBuilder res=new StringBuilder();
    char first=word.charAt(0);
    res.append(first);
    for (int i = 1; i < word.length(); i++) {
        char ch=word.charAt(i);
        if(ch!=first){
           res.append(ch);
        }
       first=ch;
    }
    System.out.println(res);

0 讨论(0)

抹茶落季

2021-01-07 15:30

You may wish to use \p{L}\p{M}* instead of [a-zA-Z] to include non-English unicode letters as well. So it will be like this: replaceAll("(\\p{L}\\p{M}*)(\\1{" + maxAllowedRepetition + ",})", "$1"); or this: replaceAll("(\\p{L}\\p{M}*)\\1{" + maxAllowedRepetition + ",}", "$1");

0 讨论(0)
发布评论:

提交评论
- 加载中...