I am trying to find the format \"abc, def g\" which is a name format \"lastname, firstname middlename\". I think the best suited method is regex but I do not have any idea i
^([a-zA-Z]+)\s*,\s*([a-zA-Z]+)\s+([a-zA-Z]+)$
I think you are looking for this.just grab the groups to get your needs.See demo.
http://regex101.com/r/hQ1rP0/6
I think this one will also work and a bit shorter than yours:
([A-Z][a-z]*)(?:,\s*)?
Or you can use split using this regex:
(,?\s+)
I would try and avoid a complicated regex, I would use String.substring()
and indexOf()
. That is, something like
String name = "Last, First Middle";
int comma = name.indexOf(',');
int lastSpace = name.lastIndexOf(' ');
String lastName = name.substring(0, comma);
String firstName = name.substring(comma + 2, lastSpace);
String middleName = name.substring(lastSpace + 1);
System.out.printf("first='%s' middle='%s' last='%s'%n", firstName,
middleName, lastName);
Output is
first='First' middle='Middle' last='Last'
Your sample input is "lastname, firstname middlename"
- with that, you can use the following regexp to extract lastname, firstname and middlename (with the addition that there might be multiple white spaces, and that there might be both capital and non-capital letters in the strings - also, all parts are mandatory):
String input = "Lastname, firstname middlename";
String regexp = "([A-Za-z]+),\\s+([A-Za-z]+)\\s+([A-Za-z]+)";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Lastname : " + matcher.group(1));
System.out.println("Firstname : " + matcher.group(2));
System.out.println("Middlename: " + matcher.group(3));
Short summary:
([A-Za-z]+) First capture group - matches one or more letters to extract the last name
,\\s+ Capture group is followed by a comma and one or more spaces
([A-Za-z]+) Second capture group - matches one or more letters to extract the first name
\\s+ Capture group is followed by one or more spaces
([A-Za-z]+) Third capture group - matches one or more letters to extract the middle name
This only works if your names contain latin letters only - probably you should use a more open match for the characters:
String input = "Müller, firstname middlename";
String regexp = "(.+),\\s+(.+)\\s+(.+)";
This matches any character for lastname, firstname and middlename.
If the spaces are optional (only the first occurrence can be optional, otherwise we can not distinguish between firstname and middlename), then use *
instead of +
:
String input = "Müller,firstname middlename";
String regexp = "(.+),\\s*(.+)\\s+(.+)";
As @Elliott mentions, there might be other possibilities like using String.split()
or String.indexOf()
with String.substring()
- regular expressions are often more flexible, but harder to maintain, especially for complex expressions.
In either case, implement unit tests with as much different inputs (including invalid ones) as possible so that you can verify that your algorithm is still valid after you modify it.
As an alternative to matching the lastname, firstname middlename
directly, you could use String.split and provide a regexp that matches the separators, instead. For instance:
static String[] lastFirstMiddle(String input){
String[] result=input.split("[,\\s]+");
System.out.println(Arrays.asList(result));
return result;
}
I tested this with inputs
"Müller, firstname middlename"
"Müller,firstname middlename"
"O'Gara, Ronan Ramón"
Note: this approach fails with surnames that contain spaces, for instance "van der Heuvel", "de Valera", "mac Piarais" or "bin Laden" but then again, OP's original specification does not seem to admit of spaces in the surname (or the other names. I work with a "Mary Kate". That's her first name, not first and middle). There's an interesting page about personal names at http://www.w3.org/International/questions/qa-personal-names