One 400GB table, One query - Need Tuning Ideas (SQL2005)

后端 未结 24 2083
予麋鹿
予麋鹿 2021-01-30 15:03

I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci

相关标签:
24条回答
  • 2021-01-30 16:00

    Thanks all for your help.

    I have made 3 edits to mistakes in the original post.

    1) The WHEREs should have been ANDs.

    2) k4 should have been MONEY not VARCHAR. Also, k1 is of length 3.

    3) The k2 should not be in the WHERE clause. As doofledorfer correctly points out, it makes no sense to have any other WHERE statements other than the full primary key.

    Here are the answers to your questions:

    Why have you clustered on the primary key?

    I was under the impression that the PK was set as a clustered index by default. I did not change it.

    Which columns can be NULL?

    None.

    What are the VARCHAR lengths?

    I made a mistake with the column types. The only remaining VARCHAR is of length 3.

    What does the query plan give you now?

    Posted in the next post.

    Help me understand more about the table. if your PK is k1,k2, you shouldn't have to select by any other column to get a completely unique record. This was a mistake. The k2 part of the PK is not in the WHERE clause.

    Knowing why you need around a million records returned might help me provide a better solution.

    The database contains daily records (the d1 TEXT column) or data. People need access to large amounts of this data to run their own reports. They need to filter it by a number of values and have it delivered sorted by time.

    It looks like you only want the earliest "g" records? Maybe only the most recent "g" records?

    Yes, the latest. But I a certain number of them. I don't know the start date beforehand.

    Do you have foreign keys on k3, k4? No. This is the only table int the DB.

    Comments:

    Even if the clustered index is proper, the more selective field should come first.

    The more selective index is not used in the WHERE clause (post edit!). So I take it it should not come first in that case?

    You may want to Move data over a certain age to a history table

    Currently all the data is used so pruning is not an option.

    You may want to defrag the index

    Currently I have none. Will look into it if this thread proves fruitful.

    Add a single index with columns k1-k6 in it; that should be the best.

    Can anyone else comment on this suggestion? Liggett78 cammented that this will double the size of the DB without helping much because of the date-column sort. Note that the DATE column is not in the WHERE clause, it is only used for ordering the data.

    Try turning k1, k2 into ints and making them foreign keys, it'll use a lot less storage for one, I'd have thought and I think it should be quicker (though I may be wrong there, I guess SQL Server caches these values).

    k2 is a bigint (mistake in the orig post). So changing k1 to an int (from a VARCHAR(3)) is an option. Do we really think this is going to make much difference. And do people really think that splitting the table into k1,k2,d1 and k1,k2,k3,k4,k5,k7 and using foreign keys would improve things?

    One good tip to improve query speeds is to put in a sub-query that cuts down your recordset size to a more manageable one. There is likely to be some set of data that immediately cuts the recordset down from, say 10 million rows, to 10,000.

    e.g. SELECT TOP(g) d1 FROM (SELECT * FROM table WHERE k1=a WITH(NOLOCK)) WHERE AND k3 = c AND k4 = d AND k5 = e AND k6 = f ORDER BY k7

    Very interesting. Would this really help? It seems like SQL Server would be very stupid if it did not cut down the data in a similar manner itself.

    Perhaps it is the time taken by your UI or whether to display the data, perhaps it is the time taken by the Network ?

    There is no UI. There certainly are network issues moving the data but I am only concerned with the time taken for the query to start returning results (I'm using an ADO.NET data reader) at the moment - one thing at a time :)

    .. [to] see the most gains ... partition the table

    Will a clustered index not have the same effect?

    Leave your primary key alone, but create a clustered index on your date column, since this is what you use in ORDER BY. That way the database engine would begin to scan the clustered key, compare columns with your supplied values and output rows that satisfy the conditions.

    Sounds like a sound plan! Any other backers?

    To summarize the suggestions:

    1) Create separate indexes on all keys: most people vote no on this?

    2) Create separate indexes on the keys with most distinct values.

    3) Create a multiple column index on some of the columns, with the columns with the most distinct values first.

    4) Throw RAM at it.

    0 讨论(0)
  • 2021-01-30 16:02

    Here is what I would do:

    • Don't create single indexes on each column. You'll be wasting space and they won't help you much (if at all)
    • Leave your primary key alone, but create a clustered index on your date column, since this is what you use in ORDER BY. That way the database engine would begin to scan the clustered key, compare columns with your supplied values and output rows that satisfy the conditions.
    • You don't need any other indexes for that. I believe even 100 values out of 100 millions for k4 would be considered poor selectivity by the optimizer (though you can try that at least).
    • if you select based on some date ranges, e.g. only data from the last month, week, year etc. you might want to look at partitioning your big table into "smaller" ones based on the date-column. Those 10-value columns would be good candidates for partition-keys too.

    BTW, you specify you entire PK in the query - assuming AND'ing in WHERE - that will select exactly 1 row.

    0 讨论(0)
  • 2021-01-30 16:02

    It is is difficult to give you a very meaningful answer. Have you looked at the disk I/O costs ? Where are you keeping the database files - perhaps it is the I/O that is stalling ? There are so many variables here that can affect the performance. Perhaps it is the time taken by your UI or whather to display the data, perhaps it is the time taken by the Network ?

    Perhaps the easiest way - where you will see the most gains will be to partition the table - if you are on the Enterprise Edition of SQL Server 2005.

    Again without having access to actual query plans, perfmon stats it is mighty hard to tell you exactly what is the problem. Your question simply doesn't give us enough to go on - and everything is just a guess.

    0 讨论(0)
  • 2021-01-30 16:03

    Partition and parallelize - check the query plan, if its not showing that the query is parallelized then find out why it isn't. You may need to break the query into a couple of steps and then bring the results together.

    If it is then parition the data across multiple physical disks, add more cores. Its got lots of work to do, once you've indexed the hell out of it raw, physical power is all thats left.

    Don't assume that SQL Server will just use all your cores. Generally you have to design your query just right so that multiple cores can be used. Check the properties of the first node in the query plan to see the DOP (degree of parallelizm). If its 1 you're wasting cores...

    0 讨论(0)
  • 2021-01-30 16:05

    Here is an idea, what if you create a second table with all of the Lookup values, and then instead of using where you join the tables and do the where clause on the new Lookup table.

    Also I think it could help if you posted a few rows of data and a sample query, if possible.

    0 讨论(0)
  • 2021-01-30 16:05

    what is D1, is it decimal or a long char please can you elaborate this. My recomendation would be to create the clustered index as (K7, k2, k1, k4) and then create an aditional index on (k3) (creation of an index on the two bool values are mostly meaningless unless the value distribution is around 30%/70% between the values, or if your table is very wide, if d1).

    this change would not greatly impact your insert speed much at all, while providing you with a rough generic answer to the clustered index.

    0 讨论(0)
提交回复
热议问题