One 400GB table, One query - Need Tuning Ideas (SQL2005)

后端 未结 24 2133
予麋鹿
予麋鹿 2021-01-30 15:03

I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci

24条回答
  •  春和景丽
    2021-01-30 16:00

    Thanks all for your help.

    I have made 3 edits to mistakes in the original post.

    1) The WHEREs should have been ANDs.

    2) k4 should have been MONEY not VARCHAR. Also, k1 is of length 3.

    3) The k2 should not be in the WHERE clause. As doofledorfer correctly points out, it makes no sense to have any other WHERE statements other than the full primary key.

    Here are the answers to your questions:

    Why have you clustered on the primary key?

    I was under the impression that the PK was set as a clustered index by default. I did not change it.

    Which columns can be NULL?

    None.

    What are the VARCHAR lengths?

    I made a mistake with the column types. The only remaining VARCHAR is of length 3.

    What does the query plan give you now?

    Posted in the next post.

    Help me understand more about the table. if your PK is k1,k2, you shouldn't have to select by any other column to get a completely unique record. This was a mistake. The k2 part of the PK is not in the WHERE clause.

    Knowing why you need around a million records returned might help me provide a better solution.

    The database contains daily records (the d1 TEXT column) or data. People need access to large amounts of this data to run their own reports. They need to filter it by a number of values and have it delivered sorted by time.

    It looks like you only want the earliest "g" records? Maybe only the most recent "g" records?

    Yes, the latest. But I a certain number of them. I don't know the start date beforehand.

    Do you have foreign keys on k3, k4? No. This is the only table int the DB.

    Comments:

    Even if the clustered index is proper, the more selective field should come first.

    The more selective index is not used in the WHERE clause (post edit!). So I take it it should not come first in that case?

    You may want to Move data over a certain age to a history table

    Currently all the data is used so pruning is not an option.

    You may want to defrag the index

    Currently I have none. Will look into it if this thread proves fruitful.

    Add a single index with columns k1-k6 in it; that should be the best.

    Can anyone else comment on this suggestion? Liggett78 cammented that this will double the size of the DB without helping much because of the date-column sort. Note that the DATE column is not in the WHERE clause, it is only used for ordering the data.

    Try turning k1, k2 into ints and making them foreign keys, it'll use a lot less storage for one, I'd have thought and I think it should be quicker (though I may be wrong there, I guess SQL Server caches these values).

    k2 is a bigint (mistake in the orig post). So changing k1 to an int (from a VARCHAR(3)) is an option. Do we really think this is going to make much difference. And do people really think that splitting the table into k1,k2,d1 and k1,k2,k3,k4,k5,k7 and using foreign keys would improve things?

    One good tip to improve query speeds is to put in a sub-query that cuts down your recordset size to a more manageable one. There is likely to be some set of data that immediately cuts the recordset down from, say 10 million rows, to 10,000.

    e.g. SELECT TOP(g) d1 FROM (SELECT * FROM table WHERE k1=a WITH(NOLOCK)) WHERE AND k3 = c AND k4 = d AND k5 = e AND k6 = f ORDER BY k7

    Very interesting. Would this really help? It seems like SQL Server would be very stupid if it did not cut down the data in a similar manner itself.

    Perhaps it is the time taken by your UI or whether to display the data, perhaps it is the time taken by the Network ?

    There is no UI. There certainly are network issues moving the data but I am only concerned with the time taken for the query to start returning results (I'm using an ADO.NET data reader) at the moment - one thing at a time :)

    .. [to] see the most gains ... partition the table

    Will a clustered index not have the same effect?

    Leave your primary key alone, but create a clustered index on your date column, since this is what you use in ORDER BY. That way the database engine would begin to scan the clustered key, compare columns with your supplied values and output rows that satisfy the conditions.

    Sounds like a sound plan! Any other backers?

    To summarize the suggestions:

    1) Create separate indexes on all keys: most people vote no on this?

    2) Create separate indexes on the keys with most distinct values.

    3) Create a multiple column index on some of the columns, with the columns with the most distinct values first.

    4) Throw RAM at it.

提交回复
热议问题