Why does INDEX creation statement have UNIQUE argument?
As I understand, the non-clustered index contains a bookmark, a pointer to a row, which should be unique to dist
While a non-unique index is sufficient to distinguish between rows (as you said), the UNIQUE
index serves as a constraint: it will prevent duplicates from being entered into the database - where "duplicates" are rows containing the same data in the indexed columns.
Example:
Firstname | Lastname | Login
================================
Joe | Smith | joes
Joe | Taylor | joet
Susan | Smith | susans
Let's assume that login names are by default generated from first name + first letter of last name.
What happens when we try to add Joe Sciavillo to the database? Normally, the system would happily generate loginname joes
and insert (Joe,Sciavillo,joes)
. Now we'd have two users with the same username - probably a Bad Thing.
Now let's say we have a UNIQUE
index on Login
column - the database will check that no other row with the same data already exists, before it allows inserting the new row. In other words, the attempt to insert another joes
will be rejected, because that data wouldn't be unique in that row any more.
Of course, you could have unique indexes on multiple columns, in which case the combination of data would have to be unique (e.g. a unique index on Firstname,Lastname
will happily accept a row with (Joe,Badzhanov)
, as the combination is not in the table yet, but would reject a second row with (Joe,Smith)
)
I think the underlying question is: what is the difference between unique and non-unique indexes?
The answer is that entries in unique indexes can each only point to a single row, while entries in non-unique indexes can point to many rows.
For example, consider an order item table:
ORDER_NO INTEGER
LINE_NO INTEGER
PRODUCT_NO INTEGER
QUANTITY DECIMAL
- with a unique index on ORDER_NO and LINE_NO, and a non-unique index on PRODUCT_NO.
For a single combination of ORDER_NO and LINE_NO there can only be one entry in the table, while for a single value of PRODUCT_NO there can be many entries in the table (because there will be many entries for that value in the index).
The UNIQUE clause specifies that the values in the column(s) must be unique across the table, essentially adding a unique constraint. A clustered index on a table specifies that the ordering of the rows in the table will be the same as the index. A non-clustered index does not change the physical ordering, which is why it is OK to have multiple non-clustered but only one clustered index. You can have unique or non-unique clustered and non-clustered indexes on a table.
The UNIQUE index clause is really just a quirk of syntax in SQL Server and some other DBMSs. In Standard SQL, uniqueness constraints are implemented through the use of the PRIMARY KEY and UNIQUE CONSTRAINT syntax, not through indexes (there are no indexes in standard SQL).
The mechanism SQL Server uses internally to implement uniqueness constraints is called a unique index. A unique index gets created automatically for you whenever you create a PRIMARY KEY or UNIQUE constraint. For reasons best known to the SQL Server development team they decided to expose the UNIQUE keyword as part of the CREATE INDEX syntax, even though the constraint syntax does the same job.
In the interests of clarity and standards support I would recommend you avoid creating UNIQUE indexes explicitly wherever possible. Use the PRIMARY KEY or UNQIUE constraint syntax instead.