How does SQL server work out the estimated number of rows?

后端 未结 4 1810
有刺的猬
有刺的猬 2020-12-20 23:19

I\'m trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I\'m seeing that for a part of the tree the estimated number of rows drasticly d

相关标签:
4条回答
  • 2020-12-20 23:50

    Since you already updated the statistics, I'd try to eliminate any parameter sniffing:

    CREATE PROCEDURE xyz
    (
        @param1 int
        ,@param2 varchar(10)
    
    )AS
    
    DECLARE @param_1 int
           ,@param_2 varchar(10)
    
    SELECT @param_1=@param1
          ,@param_2=@param2
    
    ...complex query here....
    ...WHERE column1=@param_1 AND column2=@param_2....
    
    go
    
    0 讨论(0)
  • 2020-12-20 23:52

    It uses statistics, which it keeps for each index.

    (You can also create statistics on non-indexed columns)

    To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):

    exec sp_msforeachtable 'UPDATE STATISTICS ?'
    

    If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):

    exec sp_msforeachtable "DBCC DBREINDEX('?')"
    
    • Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
    0 讨论(0)
  • 2020-12-20 23:55

    rebuilding your indexes might resolve the incorrect estimated rows value issue

    0 讨论(0)
  • 2020-12-20 23:58

    SQL Server splits each index into up to 200 ranges with the following data (from here):

    • RANGE_HI_KEY

      A key value showing the upper boundary of a histogram step.

    • RANGE_ROWS

      Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).

    • EQ_ROWS

      Specifies how many rows are exactly equal to RANGE_HI_KEY.

    • AVG_RANGE_ROWS

      Average number of rows per distinct value inside the range.

    • DISTINCT_RANGE_ROWS

      Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

    Usually, most populated values go into RANGE_HI_KEY.

    However, they can get into the range and this can lead to the skew in distribution.

    Imagine these data (among the others):

    Key value Count of rows

    1          1
    2          1
    3          10000
    4          1
    

    SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

    RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
    3             2           10000    1               2
    

    , which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

    But if 3 goes inside the range, the statistics are these:

    RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
    4             10002       1        3334            3
    

    The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

    0 讨论(0)
提交回复
热议问题