Re-indexing large table - how screwed am I?

问题

I have a 1 TB, 600m row, table which has a misguided choice of indexed columns, specifically a clustered index on the primary key column which is never used in a select query.

I want to remove the clustered index from this row and create it on a number of other rows.

Table is currently like this:

colA (PK, nvarchar(3)) [clustered index pt b]
colB (PK, bigint) [clustered index pt a]
colC (DateTime) [non-clustered index]
colD (Money) [non-clustered index]
colE (bit) [no index]
colF (bit) [no index]
colG (int) [no index]
more non-indexed columns

I would like to change it to look like this:

colA (PK, nvarchar(3)) [clustered index pt a]
colB (PK, bigint) [non-clustered index]
colC (DateTime) [non-clustered index]
colD (Money) [clustered index pt d]
colE (bit) [clustered index pt b]
colF (bit) [clustered index pt c]
colG (int) [clustered index pt e]
more non-indexed columns

Two questions: 1) How long would you guesstimate that this change will take (server spec at end of message). Unfortunately it is a live DB and I can't have downtime without some idea of how long it will be down for.

2) Is it a terrible idea to add so many columns to a clustered index? Updates are nearly never performed. There are many inserts and many selects which always use all of the proposed indexed rows as select parameters.

Server spec: 5 x 15kRPM drives in RAID 5, MS-SQL Sever 2005 and some bits to keep them running.

回答1:

For one thing, I would AVOID making the clustered index wider than it absolutely has to be. Making it into five parts seems about contra-productive. Are ALL the columns in this compound clustered index stable, e.g. never change??

If not, I would avoid them at all costs. A clustered index should be:

unique
stable
as narrow as possible

You can change your non-clustered indices - no problem. But avoid making the clustered index messy! That'll definitely bring down your performance!

Check out Kimberly Tripp's excellent blog articles on indexing:

main link here
best practices for clustering index here

Marc

回答2:

I made the changes and it didn't take too long. Here are the times for each operation, first time is when run on a backup server with a single 7200RPM drive, and the second on the main server with 15k drives in RAID.

ALTER TABLE Table DROP CONSTRAINT [PK_Table]

2:39 hrs / 19 minutes

CREATE CLUSTERED INDEX [IX_Clustered] ON [Table] 
(
 [a] ASC,
 [b] ASC,
 [c] ASC,
 [d] ASC,
 [e] ASC,
 [f] ASC
)WITH (PAD_INDEX  = ON, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 90, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = OFF) ON [PRIMARY]

15:30 hrs / 2 hrs

ALTER TABLE Table ADD CONSTRAINT
PK_hands PRIMARY KEY NONCLUSTERED 
(
 e,
 h
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

4 hrs / 1 hr

The select query most often used now takes < 10 seconds where it often took 10 to 15 minutes before. Nice improvement! Insert times seem a bit faster too.

回答3:

You should have a development environment with similar specs that you can use to try this with a copy of the live database.

回答4:

While changing the clustered index sounds like it would certainly help here, why don't you try adding a (nonclustered) covering index first?

Shouldn't take the table down while the new index is built, and should give you an indication of what performance improvement (if any) will result in this reorganization.

回答5:

You may not need to worry about the downtime, as it may be possible to do the change live (without any downtime). Applies to SQL Server 2005 Enterprise edition.

回答6:

One thing you could do if you have the disk space is create a second table with the correct clustered index copy the rows over over to the new table over several days via an incremental process. Once all the rows are there execute sp_rename on both tables (this would require just a few minutes of downtime. If your apps were referencing a view instead of the physical table you could have this done with zero downtime to your apps. I hope this helps.

[Edit] You'll also have to deal with the update to the rows, you need to have a timestamp, or last updated field available on the source table so that you can sync the updates once you have all the rows copied over.

回答7:

1) How long would you guesstimate that this change will take (server spec at end of message). Unfortunately it is a live DB and I can't have downtime without some idea of how long it will be down for.

It really, really depends on the data. Just the table parameters alone doesn't provide enough information. Could be a few minutes (unlikely) to a few days (unlikely) with the likeliest time being somewhere inbetween.

No, that should not pose any problems. Performance should only improve if you are making few updates. When those updates occur, it'll take awhile to fix the index, though, and performance will suffer during that time, which will vary depending on the data.

-Adam

回答8:

I agree with Brian, you should have a test database with same amount of data and run the index change. But, I presume that you are making this change because you think it will speed up the queries. You should run benchmarks test (before and after the index change) and ensure that your optimization doesn't become a pessimization.

来源：https://stackoverflow.com/questions/690458/re-indexing-large-table-how-screwed-am-i

标签

sql

sql-server

sql-server-2005

database-design

indexed