I\'m trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I\'m seeing that for a part of the tree the estimated number of rows drasticly d
Since you already updated the statistics, I'd try to eliminate any parameter sniffing:
CREATE PROCEDURE xyz
(
@param1 int
,@param2 varchar(10)
)AS
DECLARE @param_1 int
,@param_2 varchar(10)
SELECT @param_1=@param1
,@param_2=@param2
...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....
go
It uses statistics, which it keeps for each index.
(You can also create statistics on non-indexed columns)
To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):
exec sp_msforeachtable 'UPDATE STATISTICS ?'
If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):
exec sp_msforeachtable "DBCC DBREINDEX('?')"
rebuilding your indexes might resolve the incorrect estimated rows value issue
SQL Server
splits each index into up to 200
ranges with the following data (from here):
RANGE_HI_KEY
A key value showing the upper boundary of a histogram step.
RANGE_ROWS
Specifies how many rows are inside the range (they are smaller than this
RANGE_HI_KEY
, but bigger than the previous smallerRANGE_HI_KEY
).
EQ_ROWS
Specifies how many rows are exactly equal to
RANGE_HI_KEY
.
AVG_RANGE_ROWS
Average number of rows per distinct value inside the range.
DISTINCT_RANGE_ROWS
Specifies how many distinct key values are inside this range (not including the previous key before
RANGE_HI_KEY
andRANGE_HI_KEY
itself);
Usually, most populated values go into RANGE_HI_KEY
.
However, they can get into the range and this can lead to the skew in distribution.
Imagine these data (among the others):
Key value Count of rows
1 1
2 1
3 10000
4 1
SQL Server
usually builds two ranges: 1
to 3
and 4
to the next populated value, which makes these statistics:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS AVG_RANGE_ROWS DISTINCT_RANGE_ROWS
3 2 10000 1 2
, which means the when searching for, say, 2
, there is but 1
row and it's better to use the index access.
But if 3
goes inside the range, the statistics are these:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS AVG_RANGE_ROWS DISTINCT_RANGE_ROWS
4 10002 1 3334 3
The optimizer thinks there are 3334
rows for the key 2
and index access is too expensive.