SQL: parse the first, middle and last name from a fullname field

后端未结

关注

 23  1468

粉色の甜心

How do I parse the first, middle, and last name out of a fullname field with SQL?

I need to try to match up on names that are not a direct match on full name. I\'d

相关标签:

23条回答

后悔当初

2020-11-27 11:29
Reverse the problem, add columns to hold the individual pieces and combine them to get the full name.

The reason this will be the best answer is that there is no guaranteed way to figure out a person has registered as their first name, and what is their middle name.

For instance, how would you split this?
```
Jan Olav Olsen Heggelien
```
This, while being fictious, is a legal name in Norway, and could, but would not have to, be split like this:
```
First name: Jan Olav
Middle name: Olsen
Last name: Heggelien
```
or, like this:
```
First name: Jan Olav
Last name: Olsen Heggelien
```
or, like this:
```
First name: Jan
Middle name: Olav
Last name: Olsen Heggelien
```
I would imagine similar occurances can be found in most languages.

So instead of trying to interpreting data which does not have enough information to get it right, store the correct interpretation, and combine to get the full name.
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2020-11-27 11:29

This Will Work in Case String Is FirstName/MiddleName/LastName

Select 

DISTINCT NAMES ,

   SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1) as FirstName,

   RTRIM(LTRIM(REPLACE(REPLACE(NAMES,SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1),''),REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ),'')))as MiddleName,

   REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ) as LastName

From TABLENAME

0 讨论(0)

悲哀的现实

2020-11-27 11:30

This query is working fine.

SELECT name
    ,Ltrim(SubString(name, 1, Isnull(Nullif(CHARINDEX(' ', name), 0), 1000))) AS FirstName
    ,Ltrim(SUBSTRING(name, CharIndex(' ', name), CASE 
                WHEN (CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)) <= 0
                    THEN 0
                ELSE CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)
                END)) AS MiddleName
    ,Ltrim(SUBSTRING(name, Isnull(Nullif(CHARINDEX(' ', name, Charindex(' ', name) + 1), 0), CHARINDEX(' ', name)), CASE 
                WHEN Charindex(' ', name) = 0
                    THEN 0
                ELSE LEN(name)
                END)) AS LastName
FROM yourtableName

0 讨论(0)

春和景丽

2020-11-27 11:30

Check this query in Athena for only one-space separated string (e.g. first name and middle name combination):

SELECT name, REVERSE( SUBSTR( REVERSE(name), 1, STRPOS(REVERSE(name), ' ') ) ) AS middle_name FROM name_table

If you expect to have two or more spaces, you can easily extend the above query.

0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2020-11-27 11:31

If you are trying to parse apart a human name in PHP, I recommend Keith Beckman's nameparse.php script.

Copy in case site goes down:

<?
/*
Name:   nameparse.php
Version: 0.2a
Date:   030507
First:  030407
License:    GNU General Public License v2
Bugs:   If one of the words in the middle name is Ben (or St., for that matter),
        or any other possible last-name prefix, the name MUST be entered in
        last-name-first format. If the last-name parsing routines get ahold
        of any prefix, they tie up the rest of the name up to the suffix. i.e.:

        William Ben Carey   would yield 'Ben Carey' as the last name, while,
        Carey, William Ben  would yield 'Carey' as last and 'Ben' as middle.

        This is a problem inherent in the prefix-parsing routines algorithm,
        and probably will not be fixed. It's not my fault that there's some
        odd overlap between various languages. Just don't name your kids
        'Something Ben Something', and you should be alright.

*/

function    norm_str($string) {
    return  trim(strtolower(
        str_replace('.','',$string)));
    }

function    in_array_norm($needle,$haystack) {
    return  in_array(norm_str($needle),$haystack);
    }

function    parse_name($fullname) {
    $titles         =   array('dr','miss','mr','mrs','ms','judge');
    $prefices       =   array('ben','bin','da','dal','de','del','der','de','e',
                            'la','le','san','st','ste','van','vel','von');
    $suffices       =   array('esq','esquire','jr','sr','2','ii','iii','iv');

    $pieces         =   explode(',',preg_replace('/\s+/',' ',trim($fullname)));
    $n_pieces       =   count($pieces);

    switch($n_pieces) {
        case    1:  // array(title first middles last suffix)
            $subp   =   explode(' ',trim($pieces[0]));
            $n_subp =   count($subp);
            for($i = 0; $i < $n_subp; $i++) {
                $curr               =   trim($subp[$i]);
                $next               =   trim($subp[$i+1]);

                if($i == 0 && in_array_norm($curr,$titles)) {
                    $out['title']   =   $curr;
                    continue;
                    }

                if(!$out['first']) {
                    $out['first']   =   $curr;
                    continue;
                    }

                if($i == $n_subp-2 && $next && in_array_norm($next,$suffices)) {
                    if($out['last']) {
                        $out['last']    .=  " $curr";
                        }
                    else {
                        $out['last']    =   $curr;
                        }
                    $out['suffix']      =   $next;
                    break;
                    }

                if($i == $n_subp-1) {
                    if($out['last']) {
                        $out['last']    .=  " $curr";
                        }
                    else {
                        $out['last']    =   $curr;
                        }
                    continue;
                    }

                if(in_array_norm($curr,$prefices)) {
                    if($out['last']) {
                        $out['last']    .=  " $curr";
                        }
                    else {
                        $out['last']    =   $curr;
                        }
                    continue;
                    }

                if($next == 'y' || $next == 'Y') {
                    if($out['last']) {
                        $out['last']    .=  " $curr";
                        }
                    else {
                        $out['last']    =   $curr;
                        }
                    continue;
                    }

                if($out['last']) {
                    $out['last']    .=  " $curr";
                    continue;
                    }

                if($out['middle']) {
                    $out['middle']      .=  " $curr";
                    }
                else {
                    $out['middle']      =   $curr;
                    }
                }
            break;
        case    2:
                switch(in_array_norm($pieces[1],$suffices)) {
                    case    TRUE: // array(title first middles last,suffix)
                        $subp   =   explode(' ',trim($pieces[0]));
                        $n_subp =   count($subp);
                        for($i = 0; $i < $n_subp; $i++) {
                            $curr               =   trim($subp[$i]);
                            $next               =   trim($subp[$i+1]);

                            if($i == 0 && in_array_norm($curr,$titles)) {
                                $out['title']   =   $curr;
                                continue;
                                }

                            if(!$out['first']) {
                                $out['first']   =   $curr;
                                continue;
                                }

                            if($i == $n_subp-1) {
                                if($out['last']) {
                                    $out['last']    .=  " $curr";
                                    }
                                else {
                                    $out['last']    =   $curr;
                                    }
                                continue;
                                }

                            if(in_array_norm($curr,$prefices)) {
                                if($out['last']) {
                                    $out['last']    .=  " $curr";
                                    }
                                else {
                                    $out['last']    =   $curr;
                                    }
                                continue;
                                }

                            if($next == 'y' || $next == 'Y') {
                                if($out['last']) {
                                    $out['last']    .=  " $curr";
                                    }
                                else {
                                    $out['last']    =   $curr;
                                    }
                                continue;
                                }

                            if($out['last']) {
                                $out['last']    .=  " $curr";
                                continue;
                                }

                            if($out['middle']) {
                                $out['middle']      .=  " $curr";
                                }
                            else {
                                $out['middle']      =   $curr;
                                }
                            }                       
                        $out['suffix']  =   trim($pieces[1]);
                        break;
                    case    FALSE: // array(last,title first middles suffix)
                        $subp   =   explode(' ',trim($pieces[1]));
                        $n_subp =   count($subp);
                        for($i = 0; $i < $n_subp; $i++) {
                            $curr               =   trim($subp[$i]);
                            $next               =   trim($subp[$i+1]);

                            if($i == 0 && in_array_norm($curr,$titles)) {
                                $out['title']   =   $curr;
                                continue;
                                }

                            if(!$out['first']) {
                                $out['first']   =   $curr;
                                continue;
                                }

                        if($i == $n_subp-2 && $next &&
                            in_array_norm($next,$suffices)) {
                            if($out['middle']) {
                                $out['middle']  .=  " $curr";
                                }
                            else {
                                $out['middle']  =   $curr;
                                }
                            $out['suffix']      =   $next;
                            break;
                            }

                        if($i == $n_subp-1 && in_array_norm($curr,$suffices)) {
                            $out['suffix']      =   $curr;
                            continue;
                            }

                        if($out['middle']) {
                            $out['middle']      .=  " $curr";
                            }
                        else {
                            $out['middle']      =   $curr;
                            }
                        }
                        $out['last']    =   $pieces[0];
                        break;
                    }
            unset($pieces);
            break;
        case    3:  // array(last,title first middles,suffix)
            $subp   =   explode(' ',trim($pieces[1]));
            $n_subp =   count($subp);
            for($i = 0; $i < $n_subp; $i++) {
                $curr               =   trim($subp[$i]);
                $next               =   trim($subp[$i+1]);
                if($i == 0 && in_array_norm($curr,$titles)) {
                    $out['title']   =   $curr;
                    continue;
                    }

                if(!$out['first']) {
                    $out['first']   =   $curr;
                    continue;
                    }

                if($out['middle']) {
                    $out['middle']      .=  " $curr";
                    }
                else {
                    $out['middle']      =   $curr;
                    }
                }

            $out['last']                =   trim($pieces[0]);
            $out['suffix']              =   trim($pieces[2]);
            break;
        default:    // unparseable
            unset($pieces);
            break;
        }

    return $out;
    }


?>

0 讨论(0)

别那么骄傲

2020-11-27 11:33

Subject to the caveats that have already been raised regarding spaces in names and other anomalies, the following code will at least handle 98% of names. (Note: messy SQL because I don't have a regex option in the database I use.)

**Warning: messy SQL follows:

create table parsname (fullname char(50), name1 char(30), name2 char(30), name3 char(30), name4 char(40));
insert into parsname (fullname) select fullname from ImportTable;
update parsname set name1 = substring(fullname, 1, locate(' ', fullname)),
 fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
 where locate(' ', rtrim(fullname)) > 0;
update parsname set name2 = substring(fullname, 1, locate(' ', fullname)),
 fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
 where locate(' ', rtrim(fullname)) > 0;
update parsname set name3 = substring(fullname, 1, locate(' ', fullname)),
 fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
 where locate(' ', rtrim(fullname)) > 0;
update parsname set name4 = substring(fullname, 1, locate(' ', fullname)),
 fullname = ltrim(substring(fullname, locate(' ', fullname), length(fullname)))
 where locate(' ', rtrim(fullname)) > 0;
// fullname now contains the last word in the string.
select fullname as FirstName, '' as MiddleName, '' as LastName from parsname where fullname is not null and name1 is null and name2 is null
union all
select name1 as FirstName, name2 as MiddleName, fullname as LastName from parsname where name1 is not null and name3 is null

The code works by creating a temporary table (parsname) and tokenizing the fullname by spaces. Any names ending up with values in name3 or name4 are non-conforming and will need to be dealt with differently.

0 讨论(0)