I have a bunch of human names. They are all \"Western\" names and I only need American conventions/abbreviations (e.g., Mr. instead of Sr. for señor). Unfortunately, the pe
Since you're limited to Western-style names, I think a few rules will get you most of the way there:
{ mr mrs miss ms rev dr prof }
and any more you can think of. Using a table of title "scores" (e.g. [mr=1, mrs=1, rev=2, dr=3, prof=4]
-- order them however you want), record the highest-scoring title that was deleted.{ jr phd }
or are Roman numerals of value roughly 50 or less (/[XVI]+/
is probably a good enough regex).It will never be possible to guarantee that a name like "John Baxter Smith" is parsed correctly, since not all double-barrelled surnames use hyphens. Is "Baxter Smith" the surname? Or is "Baxter" a middle name? I think it's safe to assume that middle names are relatively more common than double-barrelled-but-unhyphenated surnames, meaning it's better to default to reporting the last word as the surname. You might want to also compile a list of common double-barrelled surnames and check against this, however.