How can I extract address from raw text using NLTK in python?

前端 未结 3 637
梦如初夏
梦如初夏 2021-02-07 19:30

I have this text

\'\'\'Hi, Mr. Sam D. Richards lives here, 44 West 22nd Street, New York, NY 12345. Can you contact him now? If you ne

3条回答
  •  难免孤独
    2021-02-07 20:17

    Definitely regular expressions :)

    Something like

    import re
    
    txt = ...
    regexp = "[0-9]{1,3} .+, .+, [A-Z]{2} [0-9]{5}"
    address = re.findall(regexp, txt)
    
    # address = ['44 West 22nd Street, New York, NY 12345']
    

    Explanation:

    [0-9]{1,3}: 1 to 3 digits, the address number

    (space): a space between the number and the street name

    .+: street name, any character for any number of occurrences

    ,: a comma and a space before the city

    .+: city, any character for any number of occurrences

    ,: a comma and a space before the state

    [A-Z]{2}: exactly 2 uppercase chars from A to Z

    [0-9]{5}: 5 digits

    re.findall(expr, string) will return an array with all the occurrences found.

提交回复
热议问题