Normalize an Address

你说的曾经没有我的故事 提交于 2019-12-05 21:27:09
Chris Walton

To normalise the schema you have; add a table Address-ZipCode table, with foreign keys Address ID and Zip Code; and primary key Address Id - identical to that in the Address table. Then include the Zip codes by using a Left Join between address and the new table. The new table will only be populated when an address has a zipcode.

However, I would suggest that if you are trying to accommodate international addresses, the schema you have is likely to be inadequate - you will need multiple address lines and more levels of category than shown in your diagram. Categories missed include country, sub-region, town, and possibly others.

My answer here (which is extremely long) shows what is needed to deal with international addresses (and other things) comprehensively. This is massive overkill unless you are dealing with millions of addresses in each of multiple countries.

I haved struggled with this for several different apps over the years. How you set this up depends upon your needs. I work in affordable housing, and one of the things we need to do is relate different geographical components (City, county, state, etc) to various REGIONS as defined by HU (Housing and Urban Development in USA).

What I ended up with looks a little like THIS:

tblState:
    StateID
    StateCode (AL, AK, AR . . . etc)
    StateName (Alabama, Alaska, Arkansas,  . . . etc)

tblCounty
    CountyID
    HUDRegionID FK to tblHUDRegion
    StateID FK to tbleState
    CountyName (Pierce County, WA; Lane County, OR)
NOTE: I recognize I could normalize even further and create a table of count names, many-to-many related to States ON stateID, but there's a limit, man!)

tblCity
    CityID
    CountyID
    CityName

tblZIPCOde
    ZIPCodeID
    CityID

tblHUDRegion
    HUDRegionID
    HUDRegionCode
    HUDRegionName

In my case, HUD regions are defined at the county level (one HUD region includes one or more counties (or "County-Towns" in some cases). Each HUD region actually has a Unique Identifier defined ny HUD (The HUD CBSA_Sub), which I use as the "HUD-region_code". Also important to note is that HUD regions can include counties in one or more states. Therefore, the HUD region identifier is related to county, but only indirectly to state, THROUGH each county. For example, the HUD "Portland/Vancouver/Beaverton" HUD MSA includes counties (and cities) in both Oregon and Washington states.

In YOUR case, you will need to define one more top-layer, tblCountry. Further, you may need to adapt the concenpt of "county" and "state" a bit to accomodate other countries ("Province" and whatever they use for subdivisions greater than City, but less than state. "Region" may work in this case as well - I believe many european coutnries use "regions").

A country has one or more States (or equivelent). A state has one or more Counties (or equivelent). A County Has one or more Cities. And Cities tend to have at least one postal code.

Regions such as the HUD region, in my case, tend to be defined as aggregations at one of these levels.

In many cases, outside of this HUD-driven model I had to develop (it is often the case one needs to ascertain which HUD MSA one is working by ZIP, or by County. In all cases, it is not safe to assume that HUD regions are contained within a apecific state.

Also important to note is that the USPS periodically change the ZIP codes for certain areas.

Normalizing, or standardizing addresses, is a huge problem for most entities that need accurate, regularly-formatted addresses. (I work in the address validation industry - for SmartyStreets - so I've dealt with this a lot.) Because of the complexity of different delivery endpoints, address changes, updates to the components of an address, and many other things, it's best to recruit a certified service to take care of that for you.

Assuming you're working with US addresses, there are APIs or list processing services you can hook into pretty easily to get the data you need. For example, if you're having issues with a NULLable ZipCode FK, then you might as well append the ZIP code to every address (if it can't find one, then why keep it because it is a bad address anyway).

One such service is LiveAddress, which handles API requests, or you could process an existing list/table of addresses with our CASS-Certified Scrubbing. Either way, I'll be happy to personally help you create a working solution...

Depending on what country you are in the rules for postal codes can get pretty dicey. You are pretty safe to assume that a postal code has one official city name, but both the U.S. and Canada allow for alternative city names for a postal code. I know this for a fact because I develop postal address validation software for North America. The non-official names are often recognized by the postal authorities and you typically have to permit their use.

Therefore, if you want to be able to use non-official names you need m:n between city and postal code. I would question why you want a code table of postal codes in any case. It is better for address storage to treat these as independent attributes rather than trying to normalize them.

If you somehow think that you will be able to use some data in your database to go backwards from postal code to city name or forwards from city name to postal code, then you are setting yourself up for disappointment! There are USPS and Canada Post recognized software solutions for doing address validation and if you spend any time actually looking into it, you'll find that the problem domain of address validation is much more complicated than you think it is. If address accuracy is important to your app (and it should be in most cases) then go buy 3rd party tools to do your address validation and store your addresses in a single table with as many columns as makes sense to you.

zip codes has_many addresses / address belongs_to zip_code. Do you need to normalize? Most apps it's best to just have a zip_code column in the addresses table. Maintaining all the zipcodes for international addresses is an uphill battle.

Also you are duplicating region_id in address and city. You might need to explain what region is in your app but this looks like it only needs to be in city.

Worldwide, 119 of 190 countries use postal codes. Notable countries which do not use them include Ireland and Panama.[1]

Besides supporting that fact, it would be a very annoying system which insisted on having a postal code. It should also allow postal code unknown.

In the U.S., each "city" has at least one zip code, so the relationship is correct. I know this from developing a zip code database for about a year.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!