Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into the
Here's an anecdote for anyone who stumbles on this question:
I speak as a person who has lived and worked on a lot of continents (Europe, Asia, North America). In my experience, and the experience of the people I work with, it has been much easier for us to use systems that do the following:
Systems built like this, I find, make my life easiest. Particularly when I'm sending mail to a postal system about which your firm has virtually no functional internal knowledge.
If your firm does have internal knowledge about particular postal systems, use my selection in Point 3 to inform which view you display to me. A lot of people know what the US postal system expects on packaging; if I select US in Point 3, feel free to make the view look appropriate for a US address. If I select a country about which your firm knows nothing--display a generic three lines and let me do the rest; don't force me to use ASCII.
And let's be real here--building a complete, encyclopedic database of all global postal systems ( public and private ) is a herculean task at best, if not an impossible one. There are, for example, postal systems in which only the local, last-mile carrier really knows where an address is located. Sometimes being able to pass notes to that carrier on the packaging is extremely useful. And mapping the local knowledge of every edge case carrier into your database is indeed an impossible task.
Just ask Gödel. ( And then ask yourself if you're attempting to use an axiomatic system to model a universe of discourse, give or take some sort of arithmetic like set theory or relational algebra. )
I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).
On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:
This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats
For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.
In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.
I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.
QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.
One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!
I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.
I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.
My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.