I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci
Help me understand more about the table. if your PK is k1,k2, you shouldn't have to select by any other column to get a completely unique record.
Do you mean to say that k1 through 7 is the PK? If so, declare it as such and it will be a clustered index. Query performance should improve dramatically.
The order by is adding a lot of overhead. Consider finding a better option that can return a smaller set of data. Knowing why you need around a million records returned might help me provide a better solution.
Edit: I get the sense that I'm not alone in my suspicion that the best place to start optimizing is in your physical table design. Do you have any control over this? Not knowing what each column stores, I can't offer very specific ideas but a very general approach follows: Put K1,3,4,5 & 6 (k2 appears to be directly related to the values in your table) in its own table with a single unique int as the PK. Then create a FK relationship back to this table. You PK on the main table would then include this field, k2 & k7. Now your query will optimizer will perform a rather inexpensive lookup in your new table, return a single record and then perform an index seek into your main table by PK only.
Add a single index with columns k1-k6 in it; that should be the best.
Also, if you can run sp_updatestats before each query.
Show the query plan output - any tuning adventure that doesnt start there is a misadventure.
That sounds like good fun.
A few questions:
A few facts seems important to me:
A few remarks came to me:
It looks like you are not using your clustered index to its full potential, and have a LOT of duplicated data.
Your clustered index seems to be constructed something like:
create clustered index IX_Clustered on Table(k1 ASC, k2 ASC)
However, your other k* columns represent only 40,000 possible permutations.
10 (k1) * 10 (k3) * 100 (k4) * 2 (k5) * 2 (k6) = 40,000
You should pull unique combinations of these 4 keys out into a separate table and give each of these a unique int (primary key "newPK").
Excuse the pseudocode please:
create table SurrogateKey(
newPK int -- /*primary key*/
, k1, k3, k4, k5, k6
)
constraint: newPK is primary key, clustered
constraint: k1, k3, k4, k5, k6 is unique
This table will only have 40,000 rows and be very fast to lookup the primary key, newPK. Then, you can lookup a single integer in your large table.
Your existing table should be altered to have the following columns:
Given the above, you can change your clustered index to:
create clustered index IX_Clustered on Table(newPK ASC)
And you can seek along this. It is guaranteed to be faster than what your query is doing now (equivalent performance to an index scan + key lookup).
declare @pk int
select @pk = newPK
from SurrogateKey
where
k1 = @k1
and k3 = @k3
and k4 = @k4
and k5 = @k5
and k6 = @k6
select top(g1) d1, k2, k7
from Table with(read uncommitted)
where newPK = @pk
order by k7
Your insert statement will need to be modified to query/insert the SurrogateKey table as well.
You can try:
alter table MyTable
add constraint PK_MyTable
primary key nonclustered (k1, k2)
create clustered index IX_MyTable
on MyTable(k4, k1, k3, k5, k6, k7)
--decreasing order of cardinality of the filter columns
This will ensure that your duplicate inserts continue to error out.
This may also instruct SQL Server to filter on (k1, k3, k4, k5, k6)
and order on (k7 asc)
in one pass, permitting SQL Server to stream the query results without the intermediate step of sorting a million results first. Once SQL Server finds the first row matching (k1, k3, k4, k5, k6)
, the next million rows or so rows will all match the same filter, and will already be in sorted order by (k7 asc)
. All filtering and ordering will be done, together, based on the clustered index.
Provided the pages are stored consecutively, and provided SQL Server knows how to optimize, that's a few disk seeks to walk down the index to find the first matching row followed by one big sequential disk read of ten thousand or so pages. That should be faster than asking SQL Server to seek all over the place to find the rows and then asking SQL Server to sort them in tempdb!
You will have to be vigilant and ensure that the clustered index is in good health at all times. You may also have to reduce the page fill factor if insert time slows down too much.