I have a database table with 5 million rows. The clustered index is auto-increment identity column. There PK is a code generated 256 byte VARCHAR
which is a SHA256 hash of a URL, this is a non-clustered index on the table.
The table is as follows:
CREATE TABLE [dbo].[store_image](
[imageSHAID] [nvarchar](256) NOT NULL,
[imageGUID] [uniqueidentifier] NOT NULL,
[imageURL] [nvarchar](2000) NOT NULL,
[showCount] [bigint] NOT NULL,
[imageURLIndex] AS (CONVERT([nvarchar](450),[imageURL],(0))),
[autoIncID] [bigint] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_imageSHAID] PRIMARY KEY NONCLUSTERED
(
[imageSHAID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [autoIncPK] ON [dbo].[store_image]
(
[autoIncID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
imageSHAID
is a SHA256 hash of an image URL e.g. "http://blah.com/image1.jpg", it is hashed into a varchar of 256 length.imageGUID
is a code generated guid in which I identify the image (it will be used as an index later, but for now I have omitted this column as an index)imageURL
is the full URL of the image (up to 2000 characters)showCount
is the number of times the image is shown, this is incremented each time this particular image is shown.imageURLIndex
is a computed column limited by 450 characters, this allows me to do text searches on the imageURL should I choose to, it is indexable (again index is omitted for brevity)autoIncID
is the clustered index, should allow faster inserting of data.
Periodically I merge from a temp table into the store_image
table. The temp table structure is as follows (very similar to the store_image table):
CREATE TABLE [dbo].[store_image_temp](
[imageSHAID] [nvarchar](256) NULL,
[imageURL] [nvarchar](2000) NULL,
[showCount] [bigint] NULL,
) ON [PRIMARY]
GO
When the merge process is run, I write a DataTable
to the temp table using the following code:
using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null))
{
bulk.DestinationTableName = "[dbo].[store_image_temp]";
bulk.WriteToServer(imageTableUpsetDataTable);
}
I then run the merge command to update the showCount
in the store_image
table by merging from the temp table based on the imageSHAID
. If the image doesn't currently exist in the store_image
table, I create it:
merge into store_image as Target using [dbo].[store_image_temp] as Source
on Target.imageSHAID=Source.imageSHAID
when matched then update set
Target.showCount=Target.showCount+Source.showCount
when not matched then insert values (Source.imageSHAID,NEWID(), Source.imageURL, Source.showCount);
I'm typically trying to merge 2k-5k rows from the temp table to the store_image
table at any one merge process.
I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 200 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 6-100 seconds (6000 - 100000MS). When the bulk insert is running I can see disk activity of around 1MB-2MB/sec, low CPU usage.
Is this a typical write time for this amount of data? It seems a little slow to me, what is causing the slow performance? Surely with the imageSHAID
being indexed we should expect quicker seek times than this?
Any help would be appreciated.
Thanks for your time.
Your UPDATE
clause in the MERGE
updates showCount
. This requires a key lookup on the clustered index.
However, the clustered index is also declared non-unique. This gives information to the optimiser even though the underlying column is unique.
So, I'd make these changes
- the clustered primary key to be
autoIncID
- the current PK on
imageSHAID
to be a standalone unique index (not constraint) and add an INCLUDE forshowCount
. Unique constraints can't have INCLUDEs
More observations:
- you don't need
nvarchar
for the hash or URL columns. These are not unicode. - A hash is also fixed length so can be
char(64)
(for SHA2-512). - The length of a column defines how much memory to assign to the query. See this for more: is there an advantage to varchar(500) over varchar(8000)?
来源:https://stackoverflow.com/questions/18143533/sql-server-merge-performance