Is there a library for parsing US addresses?

后端 未结 7 565
伪装坚强ぢ
伪装坚强ぢ 2021-01-30 11:54

I have a list of US addresses I need to break into city,state, zip code,state etc.

example address : \"16100 Sand Canyon Avenue, Suite 380 Irvine, CA 92618\"

Do

相关标签:
7条回答
  • 2021-01-30 12:06

    That pyparsing library looks very interesting and seems to do a nice job with a variety of examples. And I think that's a more readable alternative to raw regular expressions (which aren't really a good solution for this problem).

    Be aware that that kind of solution implies that you will, at some point, be standardizing addresses that aren't valid...they'll just appear valid. If knowing whether an address is in fact, real (and perhaps deliverable) is important to your application then you should be using a USPS-Certified service that using Delivery Point Validation (DPV). I am a developer for SmartyStreets, which provides just such a service, along with SDKs that make integration easy (here's a succinct sample).

    The responses come back standardized according to USPS Publication 28. The API is free for low-usage users.

    0 讨论(0)
  • 2021-01-30 12:06

    Carefully check your dataset to ensure that this problem hasn't already been handled for you.

    I spent a fair amount of time first creating a taxonomy of probably street name ending, using regexp conditionals to try to pluck out the street number from the full address strings and everything and it turned out that the attributes table for my shapefiles had already segmented out these components.

    Before you go forward with the process of parsing address strings, which is always a bit of a chore due to the inevitably strange variations (some parcel addresses are for landlocked parcels and have weird addresses, etc), make sure your dataset hasn't already done this for you!!!

    0 讨论(0)
  • 2021-01-30 12:09

    Check out this Python Package: https://github.com/SwoopSearch/pyaddress

    It also allows flexibility if you know enough details about the addresses to be parsed.

    0 讨论(0)
  • 2021-01-30 12:20

    I know this is an old post but someone might find it useful: https://usaddress.readthedocs.io/en/latest/

    >>> import usaddress
    >>> usaddress.parse('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')
    [('Robie', 'BuildingName'),
    ('House,', 'BuildingName'),
    ('5757', 'AddressNumber'),
    ('South', 'StreetNamePreDirectional'),
    ('Woodlawn', 'StreetName'),
    ('Avenue,', 'StreetNamePostType'),
    ('Chicago,', 'PlaceName'),
    ('IL', 'StateName'),
    ('60637', 'ZipCode')]
    

    Or:

    >>> import usaddress
    >>> usaddress.tag('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')
    (OrderedDict([
       ('BuildingName', 'Robie House'),
       ('AddressNumber', '5757'),
       ('StreetNamePreDirectional', 'South'),
       ('StreetName', 'Woodlawn'),
       ('StreetNamePostType', 'Avenue'),
       ('PlaceName', 'Chicago'),
       ('StateName', 'IL'),
       ('ZipCode', '60637')]),
    'Street Address')
    
    >>> usaddress.tag('State & Lake, Chicago')
    (OrderedDict([
       ('StreetName', 'State'),
       ('IntersectionSeparator', '&'),
       ('SecondStreetName', 'Lake'),
       ('PlaceName', 'Chicago')]),
    'Intersection')
    
    >>> usaddress.tag('P.O. Box 123, Chicago, IL')
    (OrderedDict([
       ('USPSBoxType', 'P.O. Box'),
       ('USPSBoxID', '123'),
       ('PlaceName', 'Chicago'),
       ('StateName', 'IL')]),
    'PO Box')
    
    0 讨论(0)
  • 2021-01-30 12:23

    Pyparsing has a bunch of functionality for parsing street addresses, check out an example for this here: http://pyparsing.wikispaces.com/file/view/streetAddressParser.py

    0 讨论(0)
  • 2021-01-30 12:29

    There is powerful open-source library libpostal that fits for this use case very nicely. There are bindings to different programming languages. Libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.

    I have created a simple Docker image with Python binding pypostal you can spin off and try very easily pypostal-docker

    0 讨论(0)
提交回复
热议问题