I\'m generating a XML file to make payments and I have a constraint for user\'s full names. That param only accept alphabet characters (a-ZAZ) + whitespaces to separe names
You can use this removeAccents method with a later replaceAll
with [^A-Za-z ]
:
public static String removeAccents(String text) {
return text == null ? null :
Normalizer.normalize(text, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
The
Normalizer
decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages).á
,é
andí
have the same sign:0301
for marking the'
accent.The
\p{InCombiningDiacriticalMarks}+
regular expression will match all such diacritic codes and we will replace them with an empty string.
And in the caller:
String original = "Carmen López-Delina Santos";
String res = removeAccents(original).replaceAll("[^A-Za-z ]", "");
System.out.println(res);
See IDEONE demo
You can first use a Normalizer and then remove the undesired characters:
String input = "Carmen López-Delina Santos";
String withoutAccent = Normalizer.normalize(input, Normalizer.Form.NFD);
String output = withoutAccent.replaceAll("[^a-zA-Z ]", "");
System.out.println(output); //prints Carmen LopezDelina Santos
Note that this may not work for all and any non-ascii letters in any language - if such a case is encountered the letter would be deleted. One such example is the Turkish i
.
The alternative in that situation is probably to list all the possible letters and their replacement...