I have this text
\'\'\'Hi, Mr. Sam D. Richards lives here, 44 West 22nd Street, New York, NY 12345. Can you contact him now? If you ne
Checkout libpostal, a library dedicated to address extraction
It cannot extract address from raw text but may help in related tasks
Definitely regular expressions :)
Something like
import re
txt = ...
regexp = "[0-9]{1,3} .+, .+, [A-Z]{2} [0-9]{5}"
address = re.findall(regexp, txt)
# address = ['44 West 22nd Street, New York, NY 12345']
Explanation:
[0-9]{1,3}
: 1 to 3 digits, the address number
(space)
: a space between the number and the street name
.+
: street name, any character for any number of occurrences
,
: a comma and a space before the city
.+
: city, any character for any number of occurrences
,
: a comma and a space before the state
[A-Z]{2}
: exactly 2 uppercase chars from A to Z
[0-9]{5}
: 5 digits
re.findall(expr, string)
will return an array with all the occurrences found.
Pyap works best not just for this particular example but also for other addresses contained in texts.
text = ...
addresses = pyap.parse(text, country='US')