How does SQL server work out the estimated number of rows?

后端未结

关注

 4  1810

I\'m trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I\'m seeing that for a part of the tree the estimated number of rows drasticly d

相关标签:

4条回答

滥情空心

2020-12-20 23:50

Since you already updated the statistics, I'd try to eliminate any parameter sniffing:

CREATE PROCEDURE xyz
(
    @param1 int
    ,@param2 varchar(10)

)AS

DECLARE @param_1 int
       ,@param_2 varchar(10)

SELECT @param_1=@param1
      ,@param_2=@param2

...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....

go

0 讨论(0)

既然无缘

2020-12-20 23:52
It uses statistics, which it keeps for each index.

(You can also create statistics on non-indexed columns)

To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):
```
exec sp_msforeachtable 'UPDATE STATISTICS ?'
```
If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):
```
exec sp_msforeachtable "DBCC DBREINDEX('?')"
```
- Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-12-20 23:55

rebuilding your indexes might resolve the incorrect estimated rows value issue

0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-20 23:58
SQL Server splits each index into up to 200 ranges with the following data (from here):
- RANGE_HI_KEY
  
  A key value showing the upper boundary of a histogram step.
- RANGE_ROWS
  
  Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).
- EQ_ROWS
  
  Specifies how many rows are exactly equal to RANGE_HI_KEY.
- AVG_RANGE_ROWS
  
  Average number of rows per distinct value inside the range.
- DISTINCT_RANGE_ROWS
  
  Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);
Usually, most populated values go into RANGE_HI_KEY.

However, they can get into the range and this can lead to the skew in distribution.

Imagine these data (among the others):

Key value Count of rows
```
1          1
2          1
3          10000
4          1
```
SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:
```
RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2
```
, which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

But if 3 goes inside the range, the statistics are these:
```
RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3
```
The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.
0 讨论(0)
发布评论:

提交评论
- 加载中...