I am wondering if there is a good-performing query to select distinct dates (ignoring times) from a table with a datetime field in SQL Server.
My problem isn\'t getting
I'm not sure why your existing query would take over 5s for 40,000 rows.
I just tried the following query against a table with 100,000 rows and it returned in less than 0.1s.
SELECT DISTINCT DATEADD(day, 0, DATEDIFF(day, 0, your_date_column))
FROM your_table
(Note that this query probably won't be able to take advantage of any indexes on the date column, but it should be reasonably quick, assuming that you're not executing it dozens of times per second.)
I've used the following:
CAST(FLOOR(CAST(@date as FLOAT)) as DateTime);
This removes the time from the date by converting it to a float
and truncating off the "time" part, which is the decimal of the float
.
Looks a little clunky but works well on a large dataset (~100,000 rows) I use repeatedly throughout the day.
Update:
Solution below tested for efficiency on a 2M
table and takes but 40 ms
.
Plain DISTINCT
on an indexed computed column took 9 seconds
.
See this entry in my blog for performance details:
Unfortunately, SQL Server
's optimizer can do neither Oracle's SKIP SCAN
nor MySQL
's INDEX FOR GROUP-BY
.
It's always Stream Aggregate
that takes long.
You can built a list of possible dates using a recursive CTE
and join it with your table:
WITH rows AS (
SELECT CAST(CAST(CAST(MIN(date) AS FLOAT) AS INTEGER) AS DATETIME) AS mindate, MAX(date) AS maxdate
FROM mytable
UNION ALL
SELECT mindate + 1, maxdate
FROM rows
WHERE mindate < maxdate
)
SELECT mindate
FROM rows
WHERE EXISTS
(
SELECT NULL
FROM mytable
WHERE date >= mindate
AND date < mindate + 1
)
OPTION (MAXRECURSION 0)
This will be more efficient than Stream Aggregate
If you want to avoid the step extraction or reformatting the date - which is presumably the main cause of the delay (by forcing a full table scan) - you've no alternative but to store the date only part of the datetime, which unfortunately will require an alteration to the database structure.
If your using SQL Server 2005 or later then a persisted computed field is the way to go
Unless otherwise specified, computed columns are virtual columns that are not physically stored in the table. Their values are recalculated every time they are referenced in a query. The Database Engine uses the PERSISTED keyword in the CREATE TABLE and ALTER TABLE statements to physically store computed columns in the table. Their values are updated when any columns that are part of their calculation change. By marking a computed column as PERSISTED, you can create an index on a computed column that is deterministic but not precise.
What is your predicate on that other filtered column ? Have you tried whether you get improvement from an index on that other filtered column, followed by the datetime field ?
I'm largely guessing here, but 5 seconds to filter a set of perhaps 100000 rows down to 40000 and then doing a sort (which is presumably what goes on) doesn't seem like an unreasonable time to me. Why do you say it's too slow ? Because it doesn't match expectations ?
Just convert the date: dateadd(dd,0, datediff(dd,0,[Some_Column]))