Identifying a person's name vs. a dictionary word

前端 未结 3 1548
再見小時候
再見小時候 2021-02-08 02:39

Is there some way to recognize that a word is likely to be/is not likely to be a person\'s name?

So if I see the word \"understanding\" I would get a probability of 0.01

3条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-08 02:58

    My quick hack would be this:

    Get the list from the census bureau of names in order of popularity, it's freely available. Give each name a normalized popularity score (1.0 = most popular, 0.0 = least).

    Then, get an opensource dictionary, and do some research to pull together a frequency score for every word. You can find one here, at wiktionary. Assign every word a popularity score, 1.0 to 0.0. The convenient thing is that if you can't find a word on the frequency list, you get to assume it's a pretty uncommon word.

    Look for a word on both lists. If it's on just one or the other, you're done. If it's on both, use a formula to compute a weighted probability... something like (Name Popularity) / (Name Popularity + Other Popularity). If it's not on either list, it's probably a name.

提交回复
热议问题