How to do a one-time load for 4 billion records from MySQL to SQL Server

问题

We have a need to do the initial data copy on a table that has 4+ billion records to target SQL Server (2014) from source MySQL (5.5). The table in question is pretty wide with 55 columns, however none of them are LOB. I'm looking for options for copying this data in the most efficient way possible.

We've tried loading via Attunity Replicate (which has worked wonderfully for tables not this large) but if the initial data copy with Attunity Replicate fails then it starts over from scratch ... losing whatever time was spent copying the data. With patching and the possibility of this table taking 3+ months to load Attunity wasn't the solution.

We've also tried smaller batch loads with a linked server. This is working but doesn't seem efficient at all.

Once the data is copied we will be using Attunity Replicate to handle CDC.

回答1:

For something like this I think SSIS would be the most simple. It's designed for large inserts as big as 1TB. In fact, I'd recommend this MSDN article We loaded 1TB in 30 Minutes and so can you.

Doing simple things like dropping indexes and performing other optimizations like partitioning would make your load faster. While 30 minutes isn't a feasible time to shoot for, it would be a very straightforward task to have an SSIS package run outside of business hours.

My business doesn't have a load on the scale you do, but we do refresh our databases of more than 100M nightly which doesn't take more than 45 minutes, even with it being poorly optimized.

回答2:

One of the most efficient way to load huge data is to read them by chunks.

I have answered many similar question for SQLite, Oracle, Db2 and MySQL. You can refer to one of them for to get more information on how to do that using SSIS:

Reading Huge volume of data from Sqlite to SQL Server fails at pre-execute (SQLite)
SSIS failing to save packages and reboots Visual Studio (Oracle)
Optimizing SSIS package for millions of rows with Order by / sort in SQL command and Merge Join (MySQL)
Getting top n to n rows from db2 (DB2)

On the other hand there are many other suggestions such as drop indexes in destination table and recreate them after insert, Create needed indexes on source table, use fast-load option to insert data ...

来源：https://stackoverflow.com/questions/56139308/how-to-do-a-one-time-load-for-4-billion-records-from-mysql-to-sql-server

标签

mysql

sql-server

ssis

etl

attunity