SQL: parse the first, middle and last name from a fullname field

后端未结

关注

 23  1466

粉色の甜心

How do I parse the first, middle, and last name out of a fullname field with SQL?

I need to try to match up on names that are not a direct match on full name. I\'d

相关标签:

23条回答

孤城傲影

2020-11-27 11:21
I'm not sure about SQL server, but in postgres you could do something like this:
```
SELECT 
  SUBSTRING(fullname, '(\\w+)') as firstname,
  SUBSTRING(fullname, '\\w+\\s(\\w+)\\s\\w+') as middle,
  COALESCE(SUBSTRING(fullname, '\\w+\\s\\w+\\s(\\w+)'), SUBSTRING(fullname, '\\w+\\s(\\w+)')) as lastname
FROM 
public.person
```
The regex expressions could probably be a bit more concise; but you get the point. This does by the way not work for persons having two double names (in the Netherlands we have this a lot 'Jan van der Ploeg') so I'd be very careful with the results.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-11-27 11:21

The biggest problem I ran into doing this was cases like "Bob R. Smith, Jr.". The algorithm I used is posted at http://www.blackbeltcoder.com/Articles/strings/splitting-a-name-into-first-and-last-names. My code is in C# but you could port it if you must have in SQL.

0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-11-27 11:22

Like #1 said, it's not trivial. Hyphenated last names, initials, double names, inverse name sequence and a variety of other anomalies can ruin your carefully crafted function.

You could use a 3rd party library (plug/disclaimer - I worked on this product):

http://www.melissadata.com/nameobject/nameobject.htm

0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-11-27 11:24

I once made a 500 character regular expression to parse first, last and middle names from an arbitrary string. Even with that honking regex, it only got around 97% accuracy due to the complete inconsistency of the input. Still, better than nothing.

0 讨论(0)
发布评论:

提交评论
- 加载中...

不要未来只要你来

2020-11-27 11:24

Here's a stored procedure that will put the first word found into First Name, the last word into Last Name and everything in between into Middle Name.

create procedure [dbo].[import_ParseName]
(            
    @FullName nvarchar(max),
    @FirstName nvarchar(255) output,
    @MiddleName nvarchar(255) output,
    @LastName nvarchar(255)  output
)
as
begin

set @FirstName = ''
set @MiddleName = ''
set @LastName = ''  
set @FullName = ltrim(rtrim(@FullName))

declare @ReverseFullName nvarchar(max)
set @ReverseFullName = reverse(@FullName)

declare @lengthOfFullName int
declare @endOfFirstName int
declare @beginningOfLastName int

set @lengthOfFullName = len(@FullName)
set @endOfFirstName = charindex(' ', @FullName)
set @beginningOfLastName = @lengthOfFullName - charindex(' ', @ReverseFullName) + 1

set @FirstName = case when @endOfFirstName <> 0 
                      then substring(@FullName, 1, @endOfFirstName - 1) 
                      else ''
                 end

set @MiddleName = case when (@endOfFirstName <> 0 and @beginningOfLastName <> 0 and @beginningOfLastName > @endOfFirstName)
                       then ltrim(rtrim(substring(@FullName, @endOfFirstName , @beginningOfLastName - @endOfFirstName))) 
                       else ''
                  end

set @LastName = case when @beginningOfLastName <> 0 
                     then substring(@FullName, @beginningOfLastName + 1 , @lengthOfFullName - @beginningOfLastName)
                     else ''
                end

return

end

And here's me calling it.

DECLARE @FirstName nvarchar(255),
        @MiddleName nvarchar(255),
        @LastName nvarchar(255)

EXEC    [dbo].[import_ParseName]
        @FullName = N'Scott The Other Scott Kowalczyk',
        @FirstName = @FirstName OUTPUT,
        @MiddleName = @MiddleName OUTPUT,
        @LastName = @LastName OUTPUT

print   @FirstName 
print   @MiddleName
print   @LastName 

output:

Scott
The Other Scott
Kowalczyk

0 讨论(0)

天命终不由人

2020-11-27 11:24
We of course all understand that there's no perfect way to solve this problem, but some solutions can get you farther than others.

In particular, it's pretty easy to go beyond simple whitespace-splitters if you just have some lists of common prefixes (Mr, Dr, Mrs, etc.), infixes (von, de, del, etc.), suffixes (Jr, III, Sr, etc.) and so on. It's also helpful if you have some lists of common first names (in various languages/cultures, if your names are diverse) so that you can guess whether a word in the middle is likely to be part of the last name or not.

BibTeX also implements some heuristics that get you part of the way there; they're encapsulated in the Text::BibTeX::Name perl module. Here's a quick code sample that does a reasonable job.
```
use Text::BibTeX;
use Text::BibTeX::Name;
$name = "Dr. Mario Luis de Luigi Jr.";
$name =~ s/^\s*([dm]rs?.?|miss)\s+//i;
$dr=$1;
$n=Text::BibTeX::Name->new($name);
print join("\t", $dr, map "@{[ $n->part($_) ]}", qw(first von last jr)), "\n";
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页