Our data resides in a SQL Server 2008 database, there will be a lot queries and joinings between tables. We have this argument inside the team, some are arguing use of integ
I personally use INT IDENTITY
for most of my primary and clustering keys.
You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL
. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.
The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be uniqe, stable, as narrow as possible, and ideally ever-increasing (all of which a INT IDENTITY
is).
See her articles on indexing here:
and also see Jimmy Nilsson's The Cost of GUIDs as Primary Key
A GUID is a horribly bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.
In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY
as your clustering key - it just takes a bit of being aware of it.
If database table records can grow into million records, I think it is not a good idea to use it as a primary key.
The major advantage of using GUIDs is that they are unique across all space and time.
The main disadvantage to using GUIDs as key values is that they are BIG. At 16 bytes a pop, they are one of the largest datatypes in SQL Server. Indexes built on GUIDs are going to be larger and slower than indexes built on IDENTITY columns, which are usually ints (4 bytes).
So they are a good solution for the cases where you need to merge data from several sources
Source : http://www.sqlteam.com/article/uniqueidentifier-vs-identity
A 128-bit GUID (uniqueidentifier
) key is of course 4x larger than a 32-bit int
key. However, there are a few key advantages:
SELECT
from the primary key based on a date/time range if you want with a few fancy CAST()
calls.SELECT scope_identity()
to get the primary key after an insert.bigint
(64 bits) instead of int
. Once you do that, uniqueidentifier
is only twice as big as a bigint
.In the end, squeezing out some small performance advantage by using integers may not be worth losing the advantages of a GUID. Test it empirically and decide for yourself.
Personally, I still use both, depending on the situation, but the deciding factor has never really come down to performance in my case.
The big problem with GUIDs as primary keys is that they cause massive table fragmentation, which can be a big performance issue (the larger the table, the larger the issue). Even as a key for a nonclustered index, they will cause index fragmentation.
You can partly mitigate the problem by setting an appropriate fill factor -- but it will still be an issue.
The size difference doesn't bother me that much, except on tables with otherwise narrow rows where table scans are also required. In those cases, being able to fit more rows per DB page is a performance advantage.
There can be good reasons to use GUIDs, but there is also a cost. I generally prefer INT IDENTITY for primary keys, but I don't avoid GUIDs when they are a better solution.