A web application I am working on has encountered an unexpected \'bug\' - The database of the app has two tables (among many others) called \'States\' and \'Cities\'.
<I think it is best to add another table, countries. Your problem is an example why database normalization is important. You can't just mix and match different keys to one column.
So, I suggest you to create these table:
+------------+--------------+ | country_id | country_name | +------------+--------------+
+------------+----------+------------+ | country_id | state_id | state_name | +------------+----------+------------+
+------------+----------+---------+-----------+ | country_id | state_id | city_id | city_name | +------------+----------+---------+-----------+
+------------+----------+---------+---------+----------+ | country_id | state_id | city_id | data_id | your_CSV | +------------+----------+---------+---------+----------+
The bold fields are primary keys. Enter a standard country_id like 1 for US, 91 for india, and so on. city_id should also use their standard id.
You can then find anything belongs to each other pretty fast with minimal overhead. All data can then entered directly to data table, thus serving as one entry point, storing all the data into single spot. I don't know with mysql, but if your database support partitioning, you can partition data tables according to country_id or country_id+state_id to a couple of server arrays, thus it will also speed up your database performance considerably. The first, second, and third table won't take much hit on server load at all, and only serve as reference. You will mainly working on fourth data table. You can add data as much as you wish, without any duplicate ever again.
If you only have one data per city, you can omit data table and move CSV_data to cities table like this:
+------------+----------+---------+-----------+----------+ | country_id | state_id | city_id | city_name | CSV_data | +------------+----------+---------+-----------+----------+
The database is not Normalised. It may be partly Normalised. You will find many more bugs and limitations in extensibility, as a result.
A hierarchy of Country then State then City is fine. You do not need a many-to-many additional table as some suggest. The said city (and many in America) is multiply in three States.
By placing CountryCode and AreaCode, concatenated, in a single column, you have broken basic database rules, not to mention added code on every access. Additionally, CountryCode is not Normalised.
The problem is that CountryCode+AreaCode is a poor choice for a key for a City. In real terms, it has very little to do with a city, it applies to huge swaths of land. If the meaning of City was changed to town (as in, your company starts collecting data for large towns), the db would break completely.
Magician has the only answer that is close to being correct, that would save you from your current limitations due to lack of Normalisation. It is not accurate to say that Magician's answer is Normalised; it is correct choice of Identifiers, which form a hierarchy in this case. But I would remove the "id" columns because they are unnecessary, 100% redundant columns, 100% redundant indices. The char() columns are fine as they are, and fine for the PK (compound keys). Remember you need an Index on the char() column anyway, to ensure it is unique.
When you get to the lower end of the hierarchy (City), the compound PK has become onerous (3 x CHAR(20) ), and I wouldn't want to carry it into the Data table (esp if there are daily CSV imports and many readings or rows per city). Therefore for City only, I would add a surrogate key, as the PK.
But for the posted DDL, even as it is, without Normalising the db and using Relational Identifiers, yes, the PK of City is incorrect. It should be (idStates, idAreaCode), not the other way around. That will fix your problem.
Very bad naming by the way.