Is there a way to get rid of accents and convert a whole string to regular letters?

前端 未结 12 1961
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 04:58

Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll() method and replacing letters one by one?

12条回答
  •  误落风尘
    2020-11-22 05:18

    Use java.text.Normalizer to handle this for you.

    string = Normalizer.normalize(string, Normalizer.Form.NFD);
    // or Normalizer.Form.NFKD for a more "compatable" deconstruction 
    

    This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

    string = string.replaceAll("[^\\p{ASCII}]", "");
    

    If your text is in unicode, you should use this instead:

    string = string.replaceAll("\\p{M}", "");
    

    For unicode, \\P{M} matches the base glyph and \\p{M} (lowercase) matches each accent.

    Thanks to GarretWilson for the pointer and regular-expressions.info for the great unicode guide.

提交回复
热议问题