How should international geographical addresses be stored in a relational database?

前端未结

关注

 9  1337

Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into the

相关标签:

9条回答

一整个雨季

2020-11-30 17:11
Here's an anecdote for anyone who stumbles on this question:

I speak as a person who has lived and worked on a lot of continents (Europe, Asia, North America). In my experience, and the experience of the people I work with, it has been much easier for us to use systems that do the following:
1. Provide three lines into which I will type one address. Pass these three lines on to your local postal service as I type them, verbatim. Let me use any character set I want; use UTF-8 or something better.
2. If your system has business requirements that need me to specify particular information ( such as zip code, prefecture, state, etc. ), ask for that separately. By business requirements, I mean things like analytics; these bits of information should not be shared with your local postal service ( unless I also happened to write the same information into one of the three lines from Point 1, above ).
3. Have a dropdown that asks me to specify the categorical location of address I provided in the lines of Point 1 above, perhaps Country.
4. If you must parse the information I provide in the lines of Point 1, use my answer to Point 3 to select regex. Run that regex against the information in Point 1 to parse it. Try to fill the user interface elements of Point 2 using the output from your regex. If I correct that autofilled information--use the fact that I changed it to improve your regex. Similarly, as much as possible, give me an opportunity to review and correct the output of your regex: nobody knows better what I intended to communicate than me.
Systems built like this, I find, make my life easiest. Particularly when I'm sending mail to a postal system about which your firm has virtually no functional internal knowledge.

If your firm does have internal knowledge about particular postal systems, use my selection in Point 3 to inform which view you display to me. A lot of people know what the US postal system expects on packaging; if I select US in Point 3, feel free to make the view look appropriate for a US address. If I select a country about which your firm knows nothing--display a generic three lines and let me do the rest; don't force me to use ASCII.

And let's be real here--building a complete, encyclopedic database of all global postal systems ( public and private ) is a herculean task at best, if not an impossible one. There are, for example, postal systems in which only the local, last-mile carrier really knows where an address is located. Sometimes being able to pass notes to that carrier on the packaging is extremely useful. And mapping the local knowledge of every edge case carrier into your database is indeed an impossible task.

Just ask Gödel. ( And then ask yourself if you're attempting to use an axiomatic system to model a universe of discourse, give or take some sort of arithmetic like set theory or relational algebra. )
0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-11-30 17:17
I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).

On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:
- Street Number - Numeric
- House or Building Name - [VarChar - in the UK some houses/buildings are identified by name, not by number]
- Street Number Suffix [VarChar, although in most cases, Char(1) would suffice]
  - A, B etc
- Street Name [VarChar]
- Street Type [VarChar or Int if you have a StreetTypes table]
  - So far, I've found 262 unique types in the English speaking world, there are likely more, and don't forget other languages i.e. Strasse, Rue etc.
- Street Direction [VarChar(2)]
  - N, E, S, W, NE, SE, NW, SW
- Address Type [VarChar or Int if you have an AddressTypes table]
  - PO Box
  - Apartment
  - Building
  - Floor
  - Office
  - Suite
  - etc...
- Address Type Identifier [VarChar]
  - i.e. Box Number, Apartment Number, Floor Number remember apartment numbers and offices sometimes have alphanumeric info - like 1A
- Local Municipality [VarChar or Int if you have a Municipalities table]
  - For instance, if your hamlet/village appears in the address before the town.
- City/Town [VarChar or Int if you have a Cities table]
- Governing District [VarChar or Int if you have a Districts table]
  - State (U.S.)
  - Province (Canada)
  - Federal District (Mexico)
  - County (U.K.)
  - etc...
- Postal Area [VarChar]
  - Zip (U.S.)
  - Postal Code (Canada, Mexico)
  - Postcode (U.K.)
- Country [VarChar or Int if you have a Countries table]
This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats

For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.

In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.

I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.

QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.

One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-11-30 17:18

I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.

I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.

My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2