One 400GB table, One query - Need Tuning Ideas (SQL2005)

后端未结

关注

 24  2113

I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci

相关标签:

24条回答

栀梦

2021-01-30 15:38

Help me understand more about the table. if your PK is k1,k2, you shouldn't have to select by any other column to get a completely unique record.

Do you mean to say that k1 through 7 is the PK? If so, declare it as such and it will be a clustered index. Query performance should improve dramatically.

The order by is adding a lot of overhead. Consider finding a better option that can return a smaller set of data. Knowing why you need around a million records returned might help me provide a better solution.

Edit: I get the sense that I'm not alone in my suspicion that the best place to start optimizing is in your physical table design. Do you have any control over this? Not knowing what each column stores, I can't offer very specific ideas but a very general approach follows: Put K1,3,4,5 & 6 (k2 appears to be directly related to the values in your table) in its own table with a single unique int as the PK. Then create a FK relationship back to this table. You PK on the main table would then include this field, k2 & k7. Now your query will optimizer will perform a rather inexpensive lookup in your new table, return a single record and then perform an index seek into your main table by PK only.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-01-30 15:38

Add a single index with columns k1-k6 in it; that should be the best.

Also, if you can run sp_updatestats before each query.

0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2021-01-30 15:39

Show the query plan output - any tuning adventure that doesnt start there is a misadventure.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2021-01-30 15:39
That sounds like good fun.

A few questions:
- Why did you choose these types ? varchar, money, bigint, int, bool ? is there a reason or just willing to add some good fun?
- By any chance we could get have a look the insert statement, or TSQL or the bulkinsert ?
- Can you tell how often your insert happens (is it a bulk, or random ?)
- Does the DateTime field contains the date of insert ?
- How did you came to this ? (a one man/day thinking or a team of 20 people working like crazy for the last three months ?)
A few facts seems important to me:
- You insert a million row every day
- You want only the last million data
A few remarks came to me:
- if you're interested only in the last data, deleting/archiving the useless data could make sense (start from scratch every morning)
- if there is only one "inserter" and only one "reader", you may want to switch to a specialised type (hashmap/list/deque/stack) or something more elaborated, in a programming language.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-30 15:40
It looks like you are not using your clustered index to its full potential, and have a LOT of duplicated data.

Your clustered index seems to be constructed something like:
```
create clustered index IX_Clustered on Table(k1 ASC, k2 ASC)
```
However, your other k* columns represent only 40,000 possible permutations.

10 (k1) * 10 (k3) * 100 (k4) * 2 (k5) * 2 (k6) = 40,000

You should pull unique combinations of these 4 keys out into a separate table and give each of these a unique int (primary key "newPK").

Excuse the pseudocode please:
```
create table SurrogateKey(
  newPK int -- /*primary key*/
, k1, k3, k4, k5, k6
)

constraint: newPK is primary key, clustered
constraint: k1, k3, k4, k5, k6 is unique
```
This table will only have 40,000 rows and be very fast to lookup the primary key, newPK. Then, you can lookup a single integer in your large table.

Your existing table should be altered to have the following columns:
- newPK
- k2 (which really is not a key, probably just a sequence number)
- d1
- k7 the datetime
Given the above, you can change your clustered index to:
```
create clustered index IX_Clustered on Table(newPK ASC)
```
And you can seek along this. It is guaranteed to be faster than what your query is doing now (equivalent performance to an index scan + key lookup).
```
declare @pk int
select @pk = newPK 
from SurrogateKey
where
      k1 = @k1
  and k3 = @k3
  and k4 = @k4
  and k5 = @k5
  and k6 = @k6

select top(g1) d1, k2, k7
from Table with(read uncommitted)
where newPK = @pk
order by k7
```
Your insert statement will need to be modified to query/insert the SurrogateKey table as well.
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-01-30 15:41
You can try:
```
alter table MyTable
    add constraint PK_MyTable
        primary key nonclustered (k1, k2)
create clustered index IX_MyTable
    on MyTable(k4, k1, k3, k5, k6, k7)
    --decreasing order of cardinality of the filter columns
```
This will ensure that your duplicate inserts continue to error out.

This may also instruct SQL Server to filter on (k1, k3, k4, k5, k6) and order on (k7 asc) in one pass, permitting SQL Server to stream the query results without the intermediate step of sorting a million results first. Once SQL Server finds the first row matching (k1, k3, k4, k5, k6), the next million rows or so rows will all match the same filter, and will already be in sorted order by (k7 asc). All filtering and ordering will be done, together, based on the clustered index.

Provided the pages are stored consecutively, and provided SQL Server knows how to optimize, that's a few disk seeks to walk down the index to find the first matching row followed by one big sequential disk read of ten thousand or so pages. That should be faster than asking SQL Server to seek all over the place to find the rows and then asking SQL Server to sort them in tempdb!

You will have to be vigilant and ensure that the clustered index is in good health at all times. You may also have to reduce the page fill factor if insert time slows down too much.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页