How to filter a Java String to get only alphabet characters?

后端 未结 2 908
北荒
北荒 2021-01-12 12:24

I\'m generating a XML file to make payments and I have a constraint for user\'s full names. That param only accept alphabet characters (a-ZAZ) + whitespaces to separe names

相关标签:
2条回答
  • 2021-01-12 12:52

    You can use this removeAccents method with a later replaceAll with [^A-Za-z ]:

    public static String removeAccents(String text) {
      return text == null ? null :
        Normalizer.normalize(text, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
    }
    

    The Normalizer decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages). á, é and í have the same sign: 0301 for marking the ' accent.

    The \p{InCombiningDiacriticalMarks}+ regular expression will match all such diacritic codes and we will replace them with an empty string.

    And in the caller:

    String original = "Carmen López-Delina Santos";
    String res = removeAccents(original).replaceAll("[^A-Za-z ]", "");
    System.out.println(res);
    

    See IDEONE demo

    0 讨论(0)
  • 2021-01-12 12:54

    You can first use a Normalizer and then remove the undesired characters:

    String input = "Carmen López-Delina Santos";
    String withoutAccent = Normalizer.normalize(input, Normalizer.Form.NFD);
    String output = withoutAccent.replaceAll("[^a-zA-Z ]", "");
    System.out.println(output); //prints Carmen LopezDelina Santos
    

    Note that this may not work for all and any non-ascii letters in any language - if such a case is encountered the letter would be deleted. One such example is the Turkish i.

    The alternative in that situation is probably to list all the possible letters and their replacement...

    0 讨论(0)
提交回复
热议问题