SQL: parse the first, middle and last name from a fullname field

后端 未结 23 1466
粉色の甜心
粉色の甜心 2020-11-27 10:47

How do I parse the first, middle, and last name out of a fullname field with SQL?

I need to try to match up on names that are not a direct match on full name. I\'d

相关标签:
23条回答
  • 2020-11-27 11:21

    I'm not sure about SQL server, but in postgres you could do something like this:

    SELECT 
      SUBSTRING(fullname, '(\\w+)') as firstname,
      SUBSTRING(fullname, '\\w+\\s(\\w+)\\s\\w+') as middle,
      COALESCE(SUBSTRING(fullname, '\\w+\\s\\w+\\s(\\w+)'), SUBSTRING(fullname, '\\w+\\s(\\w+)')) as lastname
    FROM 
    public.person
    

    The regex expressions could probably be a bit more concise; but you get the point. This does by the way not work for persons having two double names (in the Netherlands we have this a lot 'Jan van der Ploeg') so I'd be very careful with the results.

    0 讨论(0)
  • 2020-11-27 11:21

    The biggest problem I ran into doing this was cases like "Bob R. Smith, Jr.". The algorithm I used is posted at http://www.blackbeltcoder.com/Articles/strings/splitting-a-name-into-first-and-last-names. My code is in C# but you could port it if you must have in SQL.

    0 讨论(0)
  • 2020-11-27 11:22

    Like #1 said, it's not trivial. Hyphenated last names, initials, double names, inverse name sequence and a variety of other anomalies can ruin your carefully crafted function.

    You could use a 3rd party library (plug/disclaimer - I worked on this product):

    http://www.melissadata.com/nameobject/nameobject.htm

    0 讨论(0)
  • 2020-11-27 11:24

    I once made a 500 character regular expression to parse first, last and middle names from an arbitrary string. Even with that honking regex, it only got around 97% accuracy due to the complete inconsistency of the input. Still, better than nothing.

    0 讨论(0)
  • Here's a stored procedure that will put the first word found into First Name, the last word into Last Name and everything in between into Middle Name.

    create procedure [dbo].[import_ParseName]
    (            
        @FullName nvarchar(max),
        @FirstName nvarchar(255) output,
        @MiddleName nvarchar(255) output,
        @LastName nvarchar(255)  output
    )
    as
    begin
    
    set @FirstName = ''
    set @MiddleName = ''
    set @LastName = ''  
    set @FullName = ltrim(rtrim(@FullName))
    
    declare @ReverseFullName nvarchar(max)
    set @ReverseFullName = reverse(@FullName)
    
    declare @lengthOfFullName int
    declare @endOfFirstName int
    declare @beginningOfLastName int
    
    set @lengthOfFullName = len(@FullName)
    set @endOfFirstName = charindex(' ', @FullName)
    set @beginningOfLastName = @lengthOfFullName - charindex(' ', @ReverseFullName) + 1
    
    set @FirstName = case when @endOfFirstName <> 0 
                          then substring(@FullName, 1, @endOfFirstName - 1) 
                          else ''
                     end
    
    set @MiddleName = case when (@endOfFirstName <> 0 and @beginningOfLastName <> 0 and @beginningOfLastName > @endOfFirstName)
                           then ltrim(rtrim(substring(@FullName, @endOfFirstName , @beginningOfLastName - @endOfFirstName))) 
                           else ''
                      end
    
    set @LastName = case when @beginningOfLastName <> 0 
                         then substring(@FullName, @beginningOfLastName + 1 , @lengthOfFullName - @beginningOfLastName)
                         else ''
                    end
    
    return
    
    end 
    

    And here's me calling it.

    DECLARE @FirstName nvarchar(255),
            @MiddleName nvarchar(255),
            @LastName nvarchar(255)
    
    EXEC    [dbo].[import_ParseName]
            @FullName = N'Scott The Other Scott Kowalczyk',
            @FirstName = @FirstName OUTPUT,
            @MiddleName = @MiddleName OUTPUT,
            @LastName = @LastName OUTPUT
    
    print   @FirstName 
    print   @MiddleName
    print   @LastName 
    
    output:
    
    Scott
    The Other Scott
    Kowalczyk
    
    0 讨论(0)
  • 2020-11-27 11:24

    We of course all understand that there's no perfect way to solve this problem, but some solutions can get you farther than others.

    In particular, it's pretty easy to go beyond simple whitespace-splitters if you just have some lists of common prefixes (Mr, Dr, Mrs, etc.), infixes (von, de, del, etc.), suffixes (Jr, III, Sr, etc.) and so on. It's also helpful if you have some lists of common first names (in various languages/cultures, if your names are diverse) so that you can guess whether a word in the middle is likely to be part of the last name or not.

    BibTeX also implements some heuristics that get you part of the way there; they're encapsulated in the Text::BibTeX::Name perl module. Here's a quick code sample that does a reasonable job.

    use Text::BibTeX;
    use Text::BibTeX::Name;
    $name = "Dr. Mario Luis de Luigi Jr.";
    $name =~ s/^\s*([dm]rs?.?|miss)\s+//i;
    $dr=$1;
    $n=Text::BibTeX::Name->new($name);
    print join("\t", $dr, map "@{[ $n->part($_) ]}", qw(first von last jr)), "\n";
    
    0 讨论(0)
提交回复
热议问题