SqlBulkCopy performance

血红的双手。 提交于 2020-01-12 05:34:06

问题


I am working to increase the performance of bulk loads; 100's of millions of records + daily.

I moved this over to use the IDatareader interface in lieu of the data tables and did get a noticeable performance boost (500,000 more records a minute). The current setup is:

  • A custom cached reader to parse the delimited files.
  • Wrapping the stream reader in a buffered stream.
  • A custom object reader class that enumerates over the objects and implements the IDatareader interface.
  • Then SqlBulkCopy writes to server

The bulk of the performance bottle neck is directly in SqlBulkCopy.WriteToServer. If I unit test the process up to but excluding just the WriteToServer the process returns in roughly 1 minute. WriteToServer is taking on an additional 15 minutes +. For the unit test it is on my local machine so the same drive the database lives on so it's not having to copy the data across the network.

I am using a heap table (no indexes; clustered or unclustered; I have played around various batch sizes without major differences in performance).

There is a need to decrease the load times so I am hoping someone might now a way to squeeze a little more blood out of this turn-up.


回答1:


Why not use SSIS directly?

Anyway, if you did a treaming from parsing to IDataReader you're already on the right path. To optimize SqlBulkCopy itself you need to turn your focus to SQL Server. The key is minimally logged operations. You must read these MSDN articles:

  • Prerequisites for Minimal Logging in Bulk Import.
  • Optimizing Bulk Import Performance.

If your target is a B-Tree (ie a clustered indexed table) unfortunately one of the most important tenets of performant bulk insert, namely the sorted-input rowset, cannot be declared. Sis simple as this, ADO.Net SqlClient does not have the equivalent of SSPROP_FASTLOADOPTIONS -> ORDER(Column) (OleDb). Since the engine does not know that the data is already sorted it will add a Sort operator in the plan which is not that bad except when it spills. To avoid spills, use a small batch size (~10k). See my original point: all these are just options and clicks to set in SSIS rather than digging through OleDB MSDN spec...

If your data stream is unsorted to start with or the destination is a heap then my point above is mute.

However, achieving minimally logging is still a must for decent performance.



来源:https://stackoverflow.com/questions/15526797/sqlbulkcopy-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!