问题
I have a SQL Server 2005 database with several tables. One of the tables is used to store timestamps and message counters for several devices, and has the following columns:
CREATE TABLE [dbo].[Timestamps] (
[Id] [uniqueidentifier] NOT NULL,
[MessageCounter] [bigint] NULL,
[TimeReceived] [bigint] NULL,
[DeviceTime] [bigint] NULL,
[DeviceId] [int] NULL
)
Id
is the unique primary key (Guid.Comb), and I have indexes on both DeviceId
and MessageCounter
columns.
What I want to do is find the last inserted row (the row with the largest MessageCounter
) for a certain device.
The thing that is strange is that a query for device no. 4 (and all other devices except no.1) returns almost instantaneously:
select top 1 *
from "Timestamps"
where DeviceId = 4
order by MessageCounter desc
but the same query for device no. 1 takes forever to complete:
select top 1 *
from "Timestamps"
where DeviceId = 1 /* this is the only line changed */
order by MessageCounter desc
The strangest thing is that device 1 has much less rows than device 4:
select count(*) from "Timestamps" where DeviceId = 4
(returns 1,839,210)
select count(*) from "Timestamps" where DeviceId = 1
(returns 323,276).
Does anyone have a clue what I could be doing wrong?
[Edit]
From the execution plans for both queries, it is clearly visible that Device 1 (lower diagram) creates a much larger number of rows in Index scan:
Execution plans for device 4 (upper) and device 1 (lower) http://img295.imageshack.us/img295/5784/execplans.png
The difference is when I hover the Index Scan nodes on execution plan diagrams:
Device 4 Actual Number of Rows: 1
Device 1 Actual Number of Rows: approx. 6,500,000
6,500,000 rows is a very strange number, since my select count(*)
query returns around 300,000 rows for device 1!
回答1:
Try creating an index on (DeviceId, MessageCounter DESC)
.
Also, try this query:
select *
from "Timestamps"
where DeviceId = 1
and MessageCounter = (SELECT MAX(MessageCounter) FROM "Timestamps" WHERE DeviceID = 1)
Just guessing: The performance difference might be because DeviceId = 1
is spread across more pages than DeviceId = 4
. By sorting, I suspect you are dredging up all matching pages, even if you end up selecting only the top row.
回答2:
Are you sure the statistics are up to date? Use UPDATE STATISTICS:
UPDATE STATISTICS dbo.Timestamps
How are you running the query? If via a stored procedure, maybe you're having an issue with parameter sniffing?
回答3:
The execution plans diagramms are not very helpfull because they do not show which index are used.
The most helpfull informations comes from the following query
select DeviceId, max(MessageCounter) from "Timestamps" group by DeviceId
I assume the MessageCounter for Devices 2 to 4 are relative high numbers. The MessageCounter is a relative low number.
How does the SQL server executes the query in that case:
The server reads the MessageCounter index from high to low numbers. For every row the server make a nested seek into custered index to compare the device id.
For devices 2-4 this ends very soon, because the server finds a row in the MessageCounter Index for device 2-4. For device 1 the server needs more than 6 millions seek operations, before the server finds the first row for device 1.
It would be faster to read the deviceid index and seek into custered index. This should stops after 323k seeks. Even bad.
You should have an index that contains both the device ids and MessageCounter (as Marcelo Cantos pointed out).
回答4:
I presume that this must be happening because if you order the records by MessageCounter
descending there are 6,500,000 that it has to plough through before it finds the first one with DeviceId=4
whereas for the other DeviceId
's there is a much better spread
I presume that the DeviceId=4
predicate doesn't come into play until the Filter operator on the execution plan.
A composite index on DeviceId, MessageCounter
would resolve this. But is the Device with DeviceId=4
a legacy device for which new data is no longer being recorded? If so you may be able to get away with extracting the DeviceId=4 records into a table of their own and using a partitioned View so that queries on that device don't scan a load of unrelated records.
Below Corrected
Also What is the reason for choosing Guid.Comb as a clustered index? I presume a clustered index on DeviceId, MessageCounter
would have similar characteristics in terms of fragmentation and avoiding hot spots but be more useful.
回答5:
My first thought was that this might be due to parameter sniffing - essentially SQL Server comes up with a plan for the first time a query is run, but that query was unrepresentative of the typical workload. See http://www.sqlshare.com/solve-parameter-sniffing-by-using-local-variables_531.aspx
The advice about statistics is good, but I suspect you'll need to have a look at the query plans for both these queries. You can do this in Query Analyser - it's about three buttons to the right of the Execute button. Try to see what is different between the plans for both queries...
回答6:
Are those queries sent to SQL Server exactly like you posted them
select top 1 *
from "Timestamps"
where DeviceId = 4
order by MessageCounter desc
or did NHibernate use parametrized queries? (where deviceid = @deviceid
or something like that)??
That might explain it - SQL Server gets the parametrized query for DeviceId = 4, comes up with an execution plan that works for that parameter value, but then on the next execution, for DeviceId = 1, it stumbles and somehow the execution plan from the first query isn't optimal for that second case anymore.
Can you try to execute those two queries in the reversed order?? First with DeviceId=1 and then with DeviceId=4 - does that give you the same results??
来源:https://stackoverflow.com/questions/3180661/sql-query-executing-slowly-for-some-parameter-values