SSIS lookup transform alternatives?

﹥>﹥吖頭↗ 提交于 2019-12-13 05:26:19

问题


I need to transfer about 11 million rows daily from one database to another. The source table is about a half a billion rows total at this point.

I was using the "get everything since ?" method, using the max in the destination as the ?, but the maintenance of the source is kind of funky. They keep going back to fill holes and my method isn't working.

The standard Lookup transform takes hours to run. Pragmatic's TaskFactory has an Upsert component, but it's not in this project's budget.

Is there a better way than Lookup to lookup?


回答1:


Here are some options:

A. Reduce the input data by implementing some kind of CDC (at the volumes and data variability you're talking you should really consider this). What options do you have for CDC at the source (i.e. can you create triggers and logging tables? Do you have a version of SQL Server that supports native CDC?)

B. Load the input data into a staging table and use INSERT/UPDATE or MERGE to apply it to your target table

C. Load the input data into a staging table and DELETE/INSERT (based on date ranges) to apply it to your target table. This is what I generally do. Your load process should be able to run off a given date range and intelligently load only that data, delete it from the target and reload it.

IMHO, the SSIS lookup component is of no use at the data volumes you're talking.




回答2:


I prefer to stretch a full refresh as far as it will go, e.g. truncate the target table and deliver all rows without any lookup etc. I have one like this that chews nearly 1b rows in 3 hours. Most people are horrified by this approach intially, but it does work and is very reliable and easy to code & test.

Alternatively I would use an Execute SQL Task with a SQL MERGE statment. This gives you very detailed control over the source and target rows considered, how they are matched and what happens afterwards (insert or update).

At that scale I would be vary careful to create indexes to help the MERGE e.g on the joined columns. It will often be much much slower than the full refresh design, and will take far longer to code & test, having a higher risk of bugs.



来源:https://stackoverflow.com/questions/23373387/ssis-lookup-transform-alternatives

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!