What are ways to match street addresses in SQL Server?

前端 未结 8 2071
时光说笑
时光说笑 2021-01-01 03:44

We have a column for street addresses:

123 Maple Rd.
321 1st Ave.
etc...

Is there any way to match these addresses t

相关标签:
8条回答
  • 2021-01-01 03:57

    Stripping out data is a bad idea. Many towns will have dozens of variations of the same street - Oak Street, Oak Road, Oak Lane, Oak Circle, Oak Court, Oak Avenue, etc... As mentioned above converting to the canonical USPS abbreviation is a better approach.

    0 讨论(0)
  • 2021-01-01 04:03

    You could try SOUNDEX to see if that gets you close. http://msdn.microsoft.com/en-us/library/aa259235%28SQL.80%29.aspx

    0 讨论(0)
  • Address matching and deduplication is a messy business. Other posters are correct when they say that the addresses need to be standardized first to the local postal standards authority (The USPS for example if it is a US addresses). Once the addresses are in standard format the rest is easy.

    There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.

    So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is SmartyStreets' Bulk Address Validation Tool which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.

    Try standardizing and validating a couple of addresses here to get an idea for what the output will look like.

    Full Disclosure: I work for SmartyStreets

    0 讨论(0)
  • 2021-01-01 04:06

    Fuzzy Lookups and Groupings Provide Powerful Data Cleansing Capabilities

    0 讨论(0)
  • 2021-01-01 04:09

    In order to do proper street address matching, you need to get your addresses into a standardized form. Have a look at the USPS postal standards here (I'm asssuming you're dealing with US addresses). It is by no means an easy process if you want to be able to deal with ALL types of US mail addresses. There is software available from companies like QAS and Satori Software that you can use to do the standardization for you. You'll need to export your addresses, run them through the software and then load the database with the updated addresses. There are also third party vendors that will perform the address standardization as well. It may be overkill for what you are trying to do but it's the best way to do it. if the addresses in your database are standardized you'll have a better chance of matching them (especially if you can standardize the input as well).

    0 讨论(0)
  • 2021-01-01 04:10

    Rather than stripping out the things that can be variable, try to convert them to a "canonical form" that can be compared.

    For example, replace 'rd' or 'rd.' with 'road' and 'st' or 'st.' with 'street' before comparing.

    0 讨论(0)
提交回复
热议问题