Performance of bcp/BULK INSERT vs. Table-Valued Parameters

纵然是瞬间 提交于 2019-11-27 06:11:08

I don't really have experience with TVP yet, however there is an nice performance comparison chart vs. BULK INSERT in MSDN here.

They say that BULK INSERT has higher startup cost, but is faster thereafter. In a remote client scenario they draw the line at around 1000 rows (for "simple" server logic). Judging from their description I would say you should be fine with using TVP's. The performance hit - if any - is probably negligible and the architectural benefits seem very good.

Edit: On a side note you can avoid the server-local file and still use bulk copy by using the SqlBulkCopy object. Just populate a DataTable, and feed it into the "WriteToServer"-Method of an SqlBulkCopy instance. Easy to use, and very fast.

Solomon Rutzky

The chart mentioned with regards to the link provided in @TToni's answer needs to be taken in context. I am not sure how much actual research went into those recommendations (also note that the chart seems to only be available in the 2008 and 2008 R2 versions of that documentation).

On the other hand there is this whitepaper from the SQL Server Customer Advisory Team: Maximizing Throughput with TVP

I have been using TVPs since 2009 and have found, at least in my experience, that for anything other than simple insert into a destination table with no additional logic needs (which is rarely ever the case), then TVPs are typically the better option.

I tend to avoid staging tables as data validation should be done at the app layer. By using TVPs, that is easily accommodated and the TVP Table Variable in the stored procedure is, by its very nature, a localized staging table (hence no conflict with other processes running at the same time like you get when using a real table for staging).

Regarding the testing done in the Question, I think it could be shown to be even faster than what was originally found:

  1. You should not be using a DataTable, unless your application has use for it outside of sending the values to the TVP. Using the IEnumerable<SqlDataRecord> interface is faster and uses less memory as you are not duplicating the collection in memory only to send it to the DB. I have this documented in the following places:
  2. TVPs are Table Variables and as such do not maintain statistics. Meaning, they report only having 1 row to the Query Optimizer. So, in your proc, either:
    • Use statement-level recompile on any queries using the TVP for anything other than a simple SELECT: OPTION (RECOMPILE)
    • Create a local temporary table (i.e. single #) and copy the contents of the TVP into the temp table

I think I'd still stick with a bulk insert approach. You may find that tempdb still gets hit using a TVP with a reasonable number of rows. This is my gut feeling, I can't say I've tested the performance of using TVP (I am interested in hearing others input too though)

You don't mention if you use .NET, but the approach that I've taken to optimise previous solutions was to do a bulk load of data using the SqlBulkCopy class - you don't need to write the data to a file first before loading, just give the SqlBulkCopy class (e.g.) a DataTable - that's the fastest way to insert data into the DB. 5-10K rows isn't much, I've used this for up to 750K rows. I suspect that in general, with a few hundred rows it wouldn't make a vast difference using a TVP. But scaling up would be limited IMHO.

Perhaps the new MERGE functionality in SQL 2008 would benefit you?

Also, if your existing staging table is a single table that is used for each instance of this process and you're worried about contention etc, have you considered creating a new "temporary" but physical staging table each time, then dropping it when it's finished with?

Note you can optimize the loading into this staging table, by populating it without any indexes. Then once populated, add any required indexes on at that point (FILLFACTOR=100 for optimal read performance, as at this point it will not be updated).

Staging tables are good! Really I wouldn't want to do it any other way. Why? Because data imports can change unexpectedly (And often in ways you can't foresee, like the time the columns were still called first name and last name but had the first name data in the last name column, for instance, to pick an example not at random.) Easy to research the problem with a staging table so you can see exactly what data was in the columns the import handled. Harder to find I think when you use an in memory table. I know a lot of people who do imports for a living as I do and all of them recommend using staging tables. I suspect there is a reason for this.

Further fixing a small schema change to a working process is easier and less time consuming than redesigning the process. If it is working and no one is willing to pay for hours to change it, then only fix what needs to be fixed due to the schema change. By changing the whole process, you introduce far more potential new bugs than by making a small change to an existing, tested working process.

And just how are you going to do away with all the data cleanup tasks? You may be doing them differently, but they still need to be done. Again, changing the process the way you describe is very risky.

Personally it sounds to me like you are just offended by using older techniques rather than getting the chance to play with new toys. You seem to have no real basis for wanting to change other than bulk insert is so 2000.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!