How to parse freeform street/postal address out of text, and into components

后端 未结 9 1051
感动是毒
感动是毒 2020-11-22 13:40

We do business largely in the United States and are trying to improve user experience by combining all the address fields into a single text area. But there are a few proble

9条回答
  •  死守一世寂寞
    2020-11-22 14:24

    For US Address Parsing,

    I prefer using usaddress package that is available in pip for usaddress only

    python3 -m pip install usaddress
    

    Documentation
    PyPi

    This worked well for me for US address.

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    
    # address_parser.py
    import sys
    from usaddress import tag
    from json import dumps, loads
    
    if __name__ == '__main__':
        tag_mapping = {
            'Recipient': 'recipient',
            'AddressNumber': 'addressStreet',
            'AddressNumberPrefix': 'addressStreet',
            'AddressNumberSuffix': 'addressStreet',
            'StreetName': 'addressStreet',
            'StreetNamePreDirectional': 'addressStreet',
            'StreetNamePreModifier': 'addressStreet',
            'StreetNamePreType': 'addressStreet',
            'StreetNamePostDirectional': 'addressStreet',
            'StreetNamePostModifier': 'addressStreet',
            'StreetNamePostType': 'addressStreet',
            'CornerOf': 'addressStreet',
            'IntersectionSeparator': 'addressStreet',
            'LandmarkName': 'addressStreet',
            'USPSBoxGroupID': 'addressStreet',
            'USPSBoxGroupType': 'addressStreet',
            'USPSBoxID': 'addressStreet',
            'USPSBoxType': 'addressStreet',
            'BuildingName': 'addressStreet',
            'OccupancyType': 'addressStreet',
            'OccupancyIdentifier': 'addressStreet',
            'SubaddressIdentifier': 'addressStreet',
            'SubaddressType': 'addressStreet',
            'PlaceName': 'addressCity',
            'StateName': 'addressState',
            'ZipCode': 'addressPostalCode',
        }
        try:
            address, _ = tag(' '.join(sys.argv[1:]), tag_mapping=tag_mapping)
        except:
            with open('failed_address.txt', 'a') as fp:
                fp.write(sys.argv[1] + '\n')
            print(dumps({}))
        else:
            print(dumps(dict(address)))
    

    Running the address_parser.py

     python3 address_parser.py 9757 East Arcadia Ave. Saugus MA 01906
     {"addressStreet": "9757 East Arcadia Ave.", "addressCity": "Saugus", "addressState": "MA", "addressPostalCode": "01906"}
    

提交回复
热议问题