A web application I am working on has encountered an unexpected \'bug\' - The database of the app has two tables (among many others) called \'States\' and \'Cities\'.
<Imtroduce a surrogate key. What are you going to do when area codes change numbets or get split? Using business keys as a primary key almost always is a mistake.
Your above summary is another example of why.
I recommend adding a new primary key field to the Cities table that will be simply auto-incremental. The KISS methodology (keep it simple).
Any other solution is cumbersome and confusing in my opinion.
Having a composite key could be problematic when you want to reference that table, since the referring table would have to have all columns the primary key has.
If that's the case, you might want to have a sequence primary key, and have the idAreaCode and idStates defined in a UNIQUE NOT NULL group.
"We figured that the country code + area code combination would be unique for each city, and thus could safely be used as a primary key"
After having read this, I just stopped to read anything further in this topic.
How could someone figure it in this way?
Area codes, by definition (the first one I found on internet):
- "An Area code is the prefix numbers that are used to identify a geographical region based on the North American number Plan. This 3 digit number can be assigned to any number in North America, including Canada, The United States, Mexico, Latin America and the Caribbean" [1]
Putting aside that they are changeable and defined only in North America, the area codes are not 3-digits in some other countries (3-digits is simply not enough having hundred thousands of locations in some countries. BTW, my mother's area code has 5 digits) and they are not strictly linked to fixed geographical locations.
Area codes have migrating locations like arctic camps drifting with ice, normadic tribes, migrating military units or, even, big oceanic ships, etc.
Then, what about merging a few cities into one (or vice versa)?
[1]
http://www.successfuloffice.com/articles/answering-service-glossary-area-code.htm
If you go with adding an additional column to the key so that you can add an additional record for a given city, then you're not properly normalizing your data. Given that you've now discovered that a city can be a member of multiple states, I would suggest removing any reference to a state from the Cities table, then adding a StateCity table that allows you to relate states to cities (creating a m:m relationship).
It sounds like you are gathering data for a telephone directory. Are you? Why are states important to you? The answer to this question will probably determine which database design will work best for you.
You may think that it's obvious what a city is. It's not. It depends on what you are going to do with the data. In the US, there is this unit called MSA (Metropolitan Statistical Area). The Kansas City MSA spans both Kansas City, Kansas and Kansas City, Missouri. Whether the MSA unit makes sense or not depends on the intended use of the data. If you used area codes in US to determine cities, you'd end up with a very different grouping than MSAs. Again, it depends on what you are going to do with the data.
In general whenever hierarchical patterns of political subdivisions break down, the most general solution is to consider the relationship many-to-many. You solve this problem the same way you solve other many-to-many problems. By creating a new table, with two foreign keys. In this case the foreign keys are IdAreacode and IdStates.
Now you can have one arecode in many states and one state spanning many area codes. It seems a shame to accpet this extra overhead to cover just one exception. Do you know whether the exception you have uncovered is just the tip of the iceberg, and there are many such exceptions?