SQL: parse the first, middle and last name from a fullname field

后端 未结 23 1468
粉色の甜心
粉色の甜心 2020-11-27 10:47

How do I parse the first, middle, and last name out of a fullname field with SQL?

I need to try to match up on names that are not a direct match on full name. I\'d

相关标签:
23条回答
  • 2020-11-27 11:29

    Reverse the problem, add columns to hold the individual pieces and combine them to get the full name.

    The reason this will be the best answer is that there is no guaranteed way to figure out a person has registered as their first name, and what is their middle name.

    For instance, how would you split this?

    Jan Olav Olsen Heggelien
    

    This, while being fictious, is a legal name in Norway, and could, but would not have to, be split like this:

    First name: Jan Olav
    Middle name: Olsen
    Last name: Heggelien
    

    or, like this:

    First name: Jan Olav
    Last name: Olsen Heggelien
    

    or, like this:

    First name: Jan
    Middle name: Olav
    Last name: Olsen Heggelien
    

    I would imagine similar occurances can be found in most languages.

    So instead of trying to interpreting data which does not have enough information to get it right, store the correct interpretation, and combine to get the full name.

    0 讨论(0)
  • 2020-11-27 11:29

    This Will Work in Case String Is FirstName/MiddleName/LastName

    Select 
    
    DISTINCT NAMES ,
    
       SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1) as FirstName,
    
       RTRIM(LTRIM(REPLACE(REPLACE(NAMES,SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1),''),REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ),'')))as MiddleName,
    
       REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ) as LastName
    
    From TABLENAME
    
    0 讨论(0)
  • 2020-11-27 11:30

    This query is working fine.

    SELECT name
        ,Ltrim(SubString(name, 1, Isnull(Nullif(CHARINDEX(' ', name), 0), 1000))) AS FirstName
        ,Ltrim(SUBSTRING(name, CharIndex(' ', name), CASE 
                    WHEN (CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)) <= 0
                        THEN 0
                    ELSE CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)
                    END)) AS MiddleName
        ,Ltrim(SUBSTRING(name, Isnull(Nullif(CHARINDEX(' ', name, Charindex(' ', name) + 1), 0), CHARINDEX(' ', name)), CASE 
                    WHEN Charindex(' ', name) = 0
                        THEN 0
                    ELSE LEN(name)
                    END)) AS LastName
    FROM yourtableName
    
    0 讨论(0)
  • 2020-11-27 11:30

    Check this query in Athena for only one-space separated string (e.g. first name and middle name combination):

    SELECT name, REVERSE( SUBSTR( REVERSE(name), 1, STRPOS(REVERSE(name), ' ') ) ) AS middle_name FROM name_table

    If you expect to have two or more spaces, you can easily extend the above query.

    0 讨论(0)
  • 2020-11-27 11:31

    If you are trying to parse apart a human name in PHP, I recommend Keith Beckman's nameparse.php script.

    Copy in case site goes down:

    <?
    /*
    Name:   nameparse.php
    Version: 0.2a
    Date:   030507
    First:  030407
    License:    GNU General Public License v2
    Bugs:   If one of the words in the middle name is Ben (or St., for that matter),
            or any other possible last-name prefix, the name MUST be entered in
            last-name-first format. If the last-name parsing routines get ahold
            of any prefix, they tie up the rest of the name up to the suffix. i.e.:
    
            William Ben Carey   would yield 'Ben Carey' as the last name, while,
            Carey, William Ben  would yield 'Carey' as last and 'Ben' as middle.
    
            This is a problem inherent in the prefix-parsing routines algorithm,
            and probably will not be fixed. It's not my fault that there's some
            odd overlap between various languages. Just don't name your kids
            'Something Ben Something', and you should be alright.
    
    */
    
    function    norm_str($string) {
        return  trim(strtolower(
            str_replace('.','',$string)));
        }
    
    function    in_array_norm($needle,$haystack) {
        return  in_array(norm_str($needle),$haystack);
        }
    
    function    parse_name($fullname) {
        $titles         =   array('dr','miss','mr','mrs','ms','judge');
        $prefices       =   array('ben','bin','da','dal','de','del','der','de','e',
                                'la','le','san','st','ste','van','vel','von');
        $suffices       =   array('esq','esquire','jr','sr','2','ii','iii','iv');
    
        $pieces         =   explode(',',preg_replace('/\s+/',' ',trim($fullname)));
        $n_pieces       =   count($pieces);
    
        switch($n_pieces) {
            case    1:  // array(title first middles last suffix)
                $subp   =   explode(' ',trim($pieces[0]));
                $n_subp =   count($subp);
                for($i = 0; $i < $n_subp; $i++) {
                    $curr               =   trim($subp[$i]);
                    $next               =   trim($subp[$i+1]);
    
                    if($i == 0 && in_array_norm($curr,$titles)) {
                        $out['title']   =   $curr;
                        continue;
                        }
    
                    if(!$out['first']) {
                        $out['first']   =   $curr;
                        continue;
                        }
    
                    if($i == $n_subp-2 && $next && in_array_norm($next,$suffices)) {
                        if($out['last']) {
                            $out['last']    .=  " $curr";
                            }
                        else {
                            $out['last']    =   $curr;
                            }
                        $out['suffix']      =   $next;
                        break;
                        }
    
                    if($i == $n_subp-1) {
                        if($out['last']) {
                            $out['last']    .=  " $curr";
                            }
                        else {
                            $out['last']    =   $curr;
                            }
                        continue;
                        }
    
                    if(in_array_norm($curr,$prefices)) {
                        if($out['last']) {
                            $out['last']    .=  " $curr";
                            }
                        else {
                            $out['last']    =   $curr;
                            }
                        continue;
                        }
    
                    if($next == 'y' || $next == 'Y') {
                        if($out['last']) {
                            $out['last']    .=  " $curr";
                            }
                        else {
                            $out['last']    =   $curr;
                            }
                        continue;
                        }
    
                    if($out['last']) {
                        $out['last']    .=  " $curr";
                        continue;
                        }
    
                    if($out['middle']) {
                        $out['middle']      .=  " $curr";
                        }
                    else {
                        $out['middle']      =   $curr;
                        }
                    }
                break;
            case    2:
                    switch(in_array_norm($pieces[1],$suffices)) {
                        case    TRUE: // array(title first middles last,suffix)
                            $subp   =   explode(' ',trim($pieces[0]));
                            $n_subp =   count($subp);
                            for($i = 0; $i < $n_subp; $i++) {
                                $curr               =   trim($subp[$i]);
                                $next               =   trim($subp[$i+1]);
    
                                if($i == 0 && in_array_norm($curr,$titles)) {
                                    $out['title']   =   $curr;
                                    continue;
                                    }
    
                                if(!$out['first']) {
                                    $out['first']   =   $curr;
                                    continue;
                                    }
    
                                if($i == $n_subp-1) {
                                    if($out['last']) {
                                        $out['last']    .=  " $curr";
                                        }
                                    else {
                                        $out['last']    =   $curr;
                                        }
                                    continue;
                                    }
    
                                if(in_array_norm($curr,$prefices)) {
                                    if($out['last']) {
                                        $out['last']    .=  " $curr";
                                        }
                                    else {
                                        $out['last']    =   $curr;
                                        }
                                    continue;
                                    }
    
                                if($next == 'y' || $next == 'Y') {
                                    if($out['last']) {
                                        $out['last']    .=  " $curr";
                                        }
                                    else {
                                        $out['last']    =   $curr;
                                        }
                                    continue;
                                    }
    
                                if($out['last']) {
                                    $out['last']    .=  " $curr";
                                    continue;
                                    }
    
                                if($out['middle']) {
                                    $out['middle']      .=  " $curr";
                                    }
                                else {
                                    $out['middle']      =   $curr;
                                    }
                                }                       
                            $out['suffix']  =   trim($pieces[1]);
                            break;
                        case    FALSE: // array(last,title first middles suffix)
                            $subp   =   explode(' ',trim($pieces[1]));
                            $n_subp =   count($subp);
                            for($i = 0; $i < $n_subp; $i++) {
                                $curr               =   trim($subp[$i]);
                                $next               =   trim($subp[$i+1]);
    
                                if($i == 0 && in_array_norm($curr,$titles)) {
                                    $out['title']   =   $curr;
                                    continue;
                                    }
    
                                if(!$out['first']) {
                                    $out['first']   =   $curr;
                                    continue;
                                    }
    
                            if($i == $n_subp-2 && $next &&
                                in_array_norm($next,$suffices)) {
                                if($out['middle']) {
                                    $out['middle']  .=  " $curr";
                                    }
                                else {
                                    $out['middle']  =   $curr;
                                    }
                                $out['suffix']      =   $next;
                                break;
                                }
    
                            if($i == $n_subp-1 && in_array_norm($curr,$suffices)) {
                                $out['suffix']      =   $curr;
                                continue;
                                }
    
                            if($out['middle']) {
                                $out['middle']      .=  " $curr";
                                }
                            else {
                                $out['middle']      =   $curr;
                                }
                            }
                            $out['last']    =   $pieces[0];
                            break;
                        }
                unset($pieces);
                break;
            case    3:  // array(last,title first middles,suffix)
                $subp   =   explode(' ',trim($pieces[1]));
                $n_subp =   count($subp);
                for($i = 0; $i < $n_subp; $i++) {
                    $curr               =   trim($subp[$i]);
                    $next               =   trim($subp[$i+1]);
                    if($i == 0 && in_array_norm($curr,$titles)) {
                        $out['title']   =   $curr;
                        continue;
                        }
    
                    if(!$out['first']) {
                        $out['first']   =   $curr;
                        continue;
                        }
    
                    if($out['middle']) {
                        $out['middle']      .=  " $curr";
                        }
                    else {
                        $out['middle']      =   $curr;
                        }
                    }
    
                $out['last']                =   trim($pieces[0]);
                $out['suffix']              =   trim($pieces[2]);
                break;
            default:    // unparseable
                unset($pieces);
                break;
            }
    
        return $out;
        }
    
    
    ?>
    
    0 讨论(0)
  • 2020-11-27 11:33

    Subject to the caveats that have already been raised regarding spaces in names and other anomalies, the following code will at least handle 98% of names. (Note: messy SQL because I don't have a regex option in the database I use.)

    **Warning: messy SQL follows:

    create table parsname (fullname char(50), name1 char(30), name2 char(30), name3 char(30), name4 char(40));
    insert into parsname (fullname) select fullname from ImportTable;
    update parsname set name1 = substring(fullname, 1, locate(' ', fullname)),
     fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
     where locate(' ', rtrim(fullname)) > 0;
    update parsname set name2 = substring(fullname, 1, locate(' ', fullname)),
     fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
     where locate(' ', rtrim(fullname)) > 0;
    update parsname set name3 = substring(fullname, 1, locate(' ', fullname)),
     fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
     where locate(' ', rtrim(fullname)) > 0;
    update parsname set name4 = substring(fullname, 1, locate(' ', fullname)),
     fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
     where locate(' ', rtrim(fullname)) > 0;
    // fullname now contains the last word in the string.
    select fullname as FirstName, '' as MiddleName, '' as LastName from parsname where fullname is not null and name1 is null and name2 is null
    union all
    select name1 as FirstName, name2 as MiddleName, fullname as LastName from parsname where name1 is not null and name3 is null
    

    The code works by creating a temporary table (parsname) and tokenizing the fullname by spaces. Any names ending up with values in name3 or name4 are non-conforming and will need to be dealt with differently.

    0 讨论(0)
提交回复
热议问题