copying a very large table from one DB2 to another, using perl and DBI

问题

I need to copy, on a daily basis, a very large (millions of rows) table from one DB2 DB to another, and I need to use perl and DBI.

Is there a faster way to do this than to simply fetchrow_array each row from the first DB and insert them one-by-one into the second DB? Here's what I got:

$sth1 = $udb1 -> prepare($read_query);
$sth1 -> execute();
$sth1 -> bind_columns(\(@row{@{$sth1 -> {NAME_1c}}}));

$sth2 = $udb2 -> prepare($write_query);

while ($sth1 -> fetchrow_arrayref) {
    $sth2 -> execute($row{field_name_1}, $row{field_name_2});
}

I implemented some solutions from a similar thread, but it's still slow. Certainly there has to be a better way?

回答1:

If you wrap this into one transaction, it should work much faster. Use something like this:

$sth1 = $udb1->prepare($read_query);
$sth1->execute();
$sth1->bind_columns(\(@row{@{$sth1->{NAME_1c}}}));

$udb2->begin_work();
$sth2 = $udb2->prepare($write_query);
while ($sth1->fetchrow_arrayref()) {
    $sth2->execute($row{field_name_1}, $row{field_name_2});
}
$udb2->commit();

If you have millions of rows, you may want to perform commit every few thousand rows.

Now, reason why it is faster:

In your case, every single insert is one auto-committed transaction. In other words, server has to wait until your changes are really flushed to disk for every single of your millions of rows - very SLOW!

When you wrap it into transaction, server can flush thousands of rows to disk at once - much more efficient and faster.

(If you are copying exact same table over and over again, it would be wiser to synchronize changes by some sort of unique key instead - should be million times faster).

回答2:

In addition to what mvp said here is a snippet from the DBI docs:

 my $sel = $dbh1->prepare("select foo, bar from table1");
  $sel->execute;

  my $ins = $dbh2->prepare("insert into table2 (foo, bar) values (?,?)");
  my $fetch_tuple_sub = sub { $sel->fetchrow_arrayref };

  my @tuple_status;
  $rc = $ins->execute_for_fetch($fetch_tuple_sub, \@tuple_status);
  my @errors = grep { ref $_ } @tuple_status;

which when combined with mvp's answer should be even faster especially if DBD::DB2 has its own execute_for_fetch method (which I don't know). DBDs with their own execute_for_fetch method will usually batch up operations. However, it should be a little faster anyway.

回答3:

If you are doing this daily, I would have thought DB2's export and import utilities would be the way to go. That way is likely to much faster than multiple SQL INSERT statements.

It is possible that you can do this using Perl's DBI module, but you may have to use system or backticks if you need to get it done from within a Perl script.

回答4:

If you could send the content from the source database to a file, you could use the LOAD command or INGEST utility. LOAD is very fast as it does not use logs. INGEST is normal insert, but can be restarted.

These command can be called from Perl, and DB2 makes the rest.

However, if the source and target databases are DB2, you could federate the source in the target. This means, you see remote tables (from source) in the target database. In this case, you only have to call a LOAD, and that's it. It will the fastest, because the comunication is between DB2 and DB2, not like this DB2 -> Perl -> DB2.

I think it is better to leave DB2 to deal with big tables, that having Perl in the middle. The memory could be explosed, the commits could be an issue, etc.

Also, depending on your DB2 licensing, you could you the use the Optim High Performance Unload, in order to extract tables directly from the tablespaces, and not via SQL (slower).

来源：https://stackoverflow.com/questions/14209971/copying-a-very-large-table-from-one-db2-to-another-using-perl-and-dbi

标签

sql

perl

transactions

db2

dbi