问题
I have a large database (50 million rows) containing time series data. There is a clustered index on the [datetime] column which ensures that that the table is always sorted in chronological order.
What is the most performant way to read the rows of the table out into a C# app, on a row-by-row basis?
回答1:
You should try this and find out. I just did, and saw no performance issues.
USE [master]
GO
/****** Object: Database [HugeDatabase] Script Date: 06/27/2011 13:27:50 ******/
CREATE DATABASE [HugeDatabase] ON PRIMARY
( NAME = N'HugeDatabase', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase.mdf' , SIZE = 1940736KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
LOG ON
( NAME = N'HugeDatabase_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase_log.LDF' , SIZE = 395392KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
USE [HugeDatabase]
GO
/****** Object: Table [dbo].[HugeTable] Script Date: 06/27/2011 13:27:53 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HugeTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PointInTime] [datetime] NULL,
PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IX_HugeTable_PointInTime] ON [dbo].[HugeTable]
(
[PointInTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Populate:
DECLARE @t datetime
SET @t = '2011-01-01'
DECLARE @i int
SET @i=0
SET NOCOUNT ON
WHILE (@i < 50000000)
BEGIN
INSERT INTO HugeTable(PointInTime) VALUES(@t)
SET @t = DATEADD(ss, 1, @t)
SET @i = @i + 1
END
Test:
using System;
using System.Data.SqlClient;
using System.Diagnostics;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main()
{
TimeSpan firstRead = new TimeSpan();
TimeSpan readerOpen = new TimeSpan();
TimeSpan commandOpen = new TimeSpan();
TimeSpan connectionOpen = new TimeSpan();
TimeSpan secondRead = new TimeSpan();
try
{
Stopwatch sw1 = new Stopwatch();
sw1.Start();
using (
var conn =
new SqlConnection(
@"Data Source=.\sql2k8r2;Initial Catalog=HugeDatabase;Integrated Security=True"))
{
conn.Open(); connectionOpen = sw1.Elapsed;
using (var cmd = new SqlCommand(
"SELECT * FROM HugeTable ORDER BY PointInTime", conn))
{
commandOpen = sw1.Elapsed;
var reader = cmd.ExecuteReader(); readerOpen = sw1.Elapsed;
reader.Read(); firstRead = sw1.Elapsed;
reader.Read(); secondRead = sw1.Elapsed;
}
}
sw1.Stop();
}
catch (Exception e)
{
Console.WriteLine(e);
}
finally
{
Console.WriteLine(
"Connection: {0}, command: {1}, reader: {2}, read: {3}, second read: {4}",
connectionOpen,
commandOpen - connectionOpen,
readerOpen - commandOpen,
firstRead - readerOpen,
secondRead - firstRead);
Console.Write("Enter to exit: ");
Console.ReadLine();
}
}
}
}
回答2:
I would use a SqlDataReader as it streams its results. You'll still have to specify the ordering but if you're using the clustered index to ORDER BY
it should be a (relatively) cheap operation.
using (var db = new SqlConnection(connStr)) {
using (var rs = new SqlCommand(someQuery, db).ExecuteReader()) {
while (rs.Read()) {
// do interesting things!
}
}
}
回答3:
See Row Offset in SQL Server, which mentions:
- Efficiently Paging Through Large Result Sets in SQL Server 2000
- A More Efficient Method for Paging Through Large Result Sets
- Paging Records Using SQL Server 2005 Database - ROW_NUMBER Function
来源:https://stackoverflow.com/questions/6468925/using-cursors-to-read-time-series-data-from-sql-server-using-c