Is there a way to get rid of accents and convert a whole string to regular letters?

前端 未结 12 1954
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 04:58

Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll() method and replacing letters one by one?

12条回答
  •  情话喂你
    2020-11-22 05:07

    @David Conrad solution is the fastest I tried using the Normalizer, but it does have a bug. It basically strips characters which are not accents, for example Chinese characters and other letters like æ, are all stripped. The characters that we want to strip are non spacing marks, characters which don't take up extra width in the final string. These zero width characters basically end up combined in some other character. If you can see them isolated as a character, for example like this `, my guess is that it's combined with the space character.

    public static String flattenToAscii(String string) {
        char[] out = new char[string.length()];
        String norm = Normalizer.normalize(string, Normalizer.Form.NFD);
    
        int j = 0;
        for (int i = 0, n = norm.length(); i < n; ++i) {
            char c = norm.charAt(i);
            int type = Character.getType(c);
    
            //Log.d(TAG,""+c);
            //by Ricardo, modified the character check for accents, ref: http://stackoverflow.com/a/5697575/689223
            if (type != Character.NON_SPACING_MARK){
                out[j] = c;
                j++;
            }
        }
        //Log.d(TAG,"normalized string:"+norm+"/"+new String(out));
        return new String(out);
    }
    

提交回复
热议问题