I\'m looking for the best reliable way to return the first and last name of a person given the full name, so far the best I could think of is the following
As is, you're requiring a last name -- which, of course, your first example doesn't have.
Use clustered grouping, (?:...)
, and 0-or-1 count, ?
, for the middle and last names as a whole to allow them to be optional:
'~\b(\p{L}+)\b (?: .+\b(\p{L}+)\b )?~ix' # x for spacing
This should allow the first name to be captured whether middle/last names are given or not.
$name = preg_replace('~\b(\p{L}+)\b(?:.+\b(\p{L}+)\b)?~i', '$1 $2', $name);
This might not be what you want to hear, but I don't think this problem is suited to a regular expression since names are not regular. I don't think they are even context-sensitive or context-free. If anything, they are unrestricted (I would have to sit down and think that through more than I did before I say that for sure, though) and no regular expression engine can parse an unrestricted grammar.
Depending on how clean your data is, I think you are going to have a tough time finding a single regex that does what you want. What different formats do you expect the names to be in? I've had to write similar code and there can be a lot of variations: - first last - last, first - first middle last - last, first middle
And then you have things like suffixes (Junior, senior, III, etc.) and prefixes ( Mr., Mrs, etc), combined names (e.g. John and Mary Smith). As some others have already mentioned you also have to deal with multi-part last names (e.g. Victor de la Hoya) as well.
I found I had to deal with all of those possibilities before I could reliably pull out the first and last names.
I think your best option is to simply treat everything after the first name as the surname i.e.
William Henry Gates
Forename: William
Surname: Henry Gates
Its the safest mechanism as not everyone will enter their middle name anyway. You can't simply extract William - ignore Henry - and extract Gates as for all you know, Henry is part of the Surname.
Instead of a regex you might find it easier to do something like:
$parts = explode(" ", $name);
$first = $parts[0];
$last = ""
if (count($parts) > 1) {
$last = $parts[count($parts) - 1];
}
You might want to replace multiple consecutive bits of whitespace with a single space first, so you don't get empty bits, and get rid of trailing/leading whitespace:
$name = ereg_replace("[ \t\r\n]+", " ", trim($name));
Here is simple non regex way
$name=explode(" ",$name);
$first_name=reset($name);
$last_name=end($name);
$result=$first_name.' '.$last_name;