I often see some database design like this:
Case 1:
UserTable
--id[auto increase]
--UserName
--Password
--Em
Several reason I can think of in your example for using a surrogate primary key (Id) over the username.
For your second question it would be better to use userid than the username in UserTableRole. Whether or not it is better to then also include a surrogate key for this many- to- many table is a matter of opinion. I hate using surrogate id keys for many to many tables and usually just make a compound primary key of the two foreign key ids. The only time I would consider a surrogate key here is if I needed to use it as a foreign key in yet another table.
One reason I can think of for not using things like UserName as the primary key is that they could be subject to change. Having anything that's exposed to the outside world as a primary key runs the risk of those things being changed, and it's best to have a stable primary key.
What if the user changes an email or username; do you really want to change your keys in all your relationships? IMO, it's best to have a stable key that never sees the outside world, about which everyone knows nothing, and therefore which can remain stable regardless of what changes may occur in your database.
Your question is essentially the advantages and disadvantages of using natural vs surrogate key.
Flexibility is the primary concern, with surrogates key you can change their username much more easily. And it might be possible in the future that you may need to allow duplicate usernames, e.g. mergers.
Speed is another concern, on a frequently accessed table like the user table, it's generally faster to do a join on integers than on strings.
Another is table size, when used as foreignkey, you'll have to store the whole key's value. Surrogates are very compact, and is much smaller than natural keys.
Most ORM also requires the use of surrogate because it provides consistency between tables.
Also, on many systems, it may not necessarily be safe to assume that email is unique.
I agree though that in a relationship table like UserRole, it's generally best to use a primary composite key from the foreign keys.
In Case 1: Why not use UserName field as primary key (PK)? why use another filed likes id [which is auto increased] as PK?
The UserTable.UserName
has intrinsic meaning in this data model and is called "natural key". The UserTable.id
, on the other hand, is "surrogate key".
If there is a natural key in your model, you cannot eliminate it with the surrogate key, you can just supplant it. So the question is: do you just use the natural key, or the natural and surrogate key? Both strategies are actually valid and have their pros and cons.
Typical reasons for surrogate key:
On the other hand:
In case of just UserName and Email, why not use Email as PK?
The designer probably wanted to avoid ON CASCADE UPDATE that would be necessary if user changed the e-mail.
In Case 2: In the UserRoleTable, why not use both UserName and RoleID as PK?
If there cannot be multiple connections for the same user/role pair, you have to have a key on that in any case.
Unless there are child tables with FKs referencing UserTableRole
or an unfriendly ORM is used, there is no reason for an additional surrogate PK.
1 And if clustering is used, the secondary index under the natural key may be extra "fat" (since it contains a copy of the clustering key, which is typically PK) and require a double-lookup when querying (since rows in clustered table don't have stable physical locations, so must be located through a clustering key, barring some DBMS-specific optimizations such as Oracle's "rowid guesses").
2 E.g. you wouldn't be able to find UserName
just by reading the junction table - you'd have to JOIN it with the UserTable
.
3 Surrogates are typically ordered in a way that is not meaningful to the client applications. The auto-increment surrogate key's order depends on the order of INSERTs, and querying is not typically done on a "range of users by their order of insertion". Some surrogates such as GUIDs may be more-less randomly ordered.