How to filter a Java String to get only alphabet characters?

后端未结

关注

 2  945

I\'m generating a XML file to make payments and I have a constraint for user\'s full names. That param only accept alphabet characters (a-ZAZ) + whitespaces to separe names

相关标签:

2条回答

北海茫月

2021-01-12 12:52
You can use this removeAccents method with a later replaceAll with [^A-Za-z ]:
```
public static String removeAccents(String text) {
  return text == null ? null :
    Normalizer.normalize(text, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
```
The Normalizer decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages). á, é and í have the same sign: 0301 for marking the ' accent.

The \p{InCombiningDiacriticalMarks}+ regular expression will match all such diacritic codes and we will replace them with an empty string.

And in the caller:
```
String original = "Carmen López-Delina Santos";
String res = removeAccents(original).replaceAll("[^A-Za-z ]", "");
System.out.println(res);
```
See IDEONE demo
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2021-01-12 12:54
You can first use a Normalizer and then remove the undesired characters:
```
String input = "Carmen López-Delina Santos";
String withoutAccent = Normalizer.normalize(input, Normalizer.Form.NFD);
String output = withoutAccent.replaceAll("[^a-zA-Z ]", "");
System.out.println(output); //prints Carmen LopezDelina Santos
```
Note that this may not work for all and any non-ascii letters in any language - if such a case is encountered the letter would be deleted. One such example is the Turkish i.

The alternative in that situation is probably to list all the possible letters and their replacement...
0 讨论(0)
发布评论:

提交评论
- 加载中...