Using JDBC PreparedStatement.addBatch in functional Scala code

问题

I need to read several million rows from one database and write them to another. I would like to use PreparedStatement.addBatch to do the writes in large batches (maybe 1000 rows). I do not need them to be in a transaction. I am writing my code in Scala 2.9.2.

One way to do this is as follows:

val sourceResultSet = ...
val targetStatement = targetConnection.prepareStatement(...)
var rowCount = 0
while (sourceResultSet.next()) {
  // Read values from sourceResultSet and write them to targetStatement
  targetStatement.addBatch()
  rowCount += 1
  if (rowCount % 1000 == 0) {
    targetStatement.executeBatch()
    rowCount = 0
  }
}

How can I do this in a more functional way, without using a var rowCount? I also need to consider RAM usage; I am reading several million rows, so any solution that involves have all of the source rows in memory at one time will fail.

回答1:

What's the type of sourceResultSet? I'm assuming an Iterator/Stream based on your usage, but either way, you can use Scala collection's take to grab 1000 elements at once (this works on Lists, Sets, Iterators, Streams, etc). To do it more functionally (though side-effecting only, so not pure functional), define an inline function:

def processSource(sourceResultSet: Iterator): Unit = {
  if(sourceResultSet.hasNext) {
    sourceResultSet.take(1000).foreach(row => /* Add to batch */)
    targetStatement.executeBatch()
    processResult(sourceResultSet) // How you handle the recursion depends on what sourceResultSet is
  }
}

val sourceResultSet = ...
val targetStatement = targetConnection.prepareStatement(...)
processSource(sourceResultSet)

As long as sourceResultSet is lazy (a Stream or Iterator), this will avoid loading it into memory at once.

来源：https://stackoverflow.com/questions/12604563/using-jdbc-preparedstatement-addbatch-in-functional-scala-code

标签

scala

jdbc

batch-processing