问题
I am working to increase the performance of bulk loads; 100's of millions of records + daily.
I moved this over to use the IDatareader
interface in lieu of the data tables and did get a noticeable performance boost (500,000 more records a minute). The current setup is:
- A custom cached reader to parse the delimited files.
- Wrapping the stream reader in a buffered stream.
- A custom object reader class that enumerates over the objects and implements the
IDatareader
interface. - Then
SqlBulkCopy
writes to server
The bulk of the performance bottle neck is directly in SqlBulkCopy.WriteToServer
. If I unit test the process up to but excluding just the WriteToServer
the process returns in roughly 1 minute. WriteToServer
is taking on an additional 15 minutes +. For the unit test it is on my local machine so the same drive the database lives on so it's not having to copy the data across the network.
I am using a heap table (no indexes; clustered or unclustered; I have played around various batch sizes without major differences in performance).
There is a need to decrease the load times so I am hoping someone might now a way to squeeze a little more blood out of this turn-up.
回答1:
Why not use SSIS directly?
Anyway, if you did a treaming from parsing to IDataReader you're already on the right path. To optimize SqlBulkCopy itself you need to turn your focus to SQL Server. The key is minimally logged operations. You must read these MSDN articles:
- Prerequisites for Minimal Logging in Bulk Import.
- Optimizing Bulk Import Performance.
If your target is a B-Tree (ie a clustered indexed table) unfortunately one of the most important tenets of performant bulk insert, namely the sorted-input rowset, cannot be declared. Sis simple as this, ADO.Net SqlClient does not have the equivalent of SSPROP_FASTLOADOPTIONS -> ORDER(Column) (OleDb). Since the engine does not know that the data is already sorted it will add a Sort operator in the plan which is not that bad except when it spills. To avoid spills, use a small batch size (~10k). See my original point: all these are just options and clicks to set in SSIS rather than digging through OleDB MSDN spec...
If your data stream is unsorted to start with or the destination is a heap then my point above is mute.
However, achieving minimally logging is still a must for decent performance.
来源:https://stackoverflow.com/questions/15526797/sqlbulkcopy-performance