One 400GB table, One query - Need Tuning Ideas (SQL2005)

后端 未结 24 2114
予麋鹿
予麋鹿 2021-01-30 15:03

I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci

相关标签:
24条回答
  • 2021-01-30 15:52

    I would say that 8 GB are not enough RAM for a 400 GB table. The server has no chance to keep the relevant data in memory if one index alone takes 5-8 GB. So there's lots and lots of harddisk reads which make the query slow.

    In my opinion increasing the amount of RAM and having the database on a fast RAID (perhaps splitted on multiple RAIDs?) would help the most.

    EDIT: To be sure what's your real bottleneck, run Windows' Performance Monitor.

    0 讨论(0)
  • 2021-01-30 15:56

    Why have you clustered on the primary key?
    Which columns can be NULL?
    What are the VARCHAR lengths?
    What does the query plan give you now?

    You handicap us by giving meaningless column names.

    Even if the clustered index is proper, the more selective field should come first.

    I could make recommendations based on insufficient information, but some help would be better.

    0 讨论(0)
  • 2021-01-30 15:58

    Use the SQL Profiler to work out what indexes to create, it is designed to work out that information for you and suggest improved execution profiles.

    Do you have foreign keys on k3, k4?

    Try turning k1, k2 into ints and making them foreign keys, it'll use a lot less storage for one, I'd have thought and I think it should be quicker (though I may be wrong there, I guess SQL Server caches these values). More to the point, it's easier if you ever need to update a value. You just change the name of the foreign key row- you don't then have to update 100 million primary keys, or whatever.

    One good tip to improve query speeds is to put in a sub-query that cuts down your recordset size to a more managable one.

    In:

    SELECT TOP(g) d1 
    FROM table WITH(NOLOCK)  
    WHERE k1 = a  WHERE k2 = b  WHERE k3 = c  WHERE k4 = d  WHERE k5 = e  WHERE k6 = f  
    ORDER BY k7
    

    Which, I presume should be

    SELECT TOP(g) d1 
    FROM table WITH(NOLOCK)  
    WHERE k1 = a AND k2 = b  AND k3 = c AND k4 = d AND k5 = e AND k6 = f 
    ORDER BY k7
    

    There is likely to be some set of data that immediately cuts the recordset down from, say 10 million rows, to 10,000.

    e.g.

    SELECT TOP(g) d1 
    FROM (SELECT * 
          FROM table k1=a AND k2=a WITH(NOLOCK)) 
    WHERE AND k3 = c AND k4 = d AND k5 = e AND k6 = f 
    ORDER BY k7
    

    This assumes that you can cut down the initial set of data massively by one or two of the WHERE arguments- which is almost certain.

    DBAs probably have more, better solutions!

    0 讨论(0)
  • 2021-01-30 15:58

    You need to create an index which will reduce the number of possible rows returned as quickly as possible.

    Therefore the simplest index to create would be on column k4, as that can have to highest number of different values. It is only necessary to index the initial substring of k4 where the expected values of k4 differ within that substring. This will reduce the size of the index, and speed up access.

    k7 should also be indexed as this will greatly increase the speed of the orderby clause.

    You may also need to experiment (I know, I know, you said don't experiment, but this may help...) with creating a multiple column index in this order: k4, k1, k2, k3. This, again, is to reduce the number of possible rows returned as quickly as possible.

    0 讨论(0)
  • 2021-01-30 15:59

    OK,

    Let's try to solve this problem with statistics. Before you try and create any index, you should ask what combination of keys gives me better selectiveness:

    1. K1 : 10 different values
    2. K3 : 100 different values
    3. k4 : 10 different values
    4. k5 : 2 differente values
    5. k6 : 2 differente values

    If we make a compund key of k1,k3,k4,k5,and k6 that means that key will only have 40,000 different combinations(10 * 100 * 10 * 2 * 2). That means that if we have 100,000,000 record divides by 40,000, statistically we will have a subset of 2,500 different records, on wich a sequential search will be aplied to complete the other restrictions of the WHERE clause.

    If we extrapolate this result and compare them with the current execution time(30 minutes), with a key(k1) that generates statistically a subset of 10 million different records we get:

    10,000,000 rec * X sec = 30 * 60 sec * 2,500 rec

    => X sec = 0.45 sec

    Not bad huh? Better yet. How about if we eliminate k5 and k6 from the compund index? Statistically we will have a subset of 10,000 different records where the sequential search will be performed. In theory, How much time will that take? lets see:

    10,000,000 rec * X sec = 30 * 60 * 10,000 rec

    => X sec = 1.8 sec

    Since we want the smallest index footprint traded off with the best possible performance, I would say an index on k1 + K3 + K4 is as good as it gets.

    Hope this helps,

    0 讨论(0)
  • 2021-01-30 15:59

    I would use the index tuning wizard to get a better answer.

    However, if it were me, I would try an index on K3, K4 (In the order you most commonly query) (you already have K1 and K2 indexed) and a separate indexed on K7. I don't belive the additon of the boolean fields would improve index performance.

    Remember the more indexes, the slower inserts will be. With the number of inserts you have, this ia a real concern. So truly the only real answer is that you will have to experiment with your own data and hardware and find what works best for your personal situation. The fact that it wasn't what you wanted to hear doesn't make it any less true, Indexing is very dependent on how your application actually works and the structure of your data.

    0 讨论(0)
提交回复
热议问题