Use case: Read 10 million rows [10 columns] from database and write to a file (csv format).
Which ItemReader implementation among J
To process that kind of data, you're probably going to want to parallelize it if that is possible (the only thing preventing it would be if the output file needed to retain an order from the input). Assuming you are going to parallelize your processing, you are then left with two main options for this type of use case (from what you have provided):
I did a talk on processing data in parallel with Spring Batch. Specifically, the example I present is a remote partitioned job. You can view it here: https://www.youtube.com/watch?v=CYTj5YT7CZU
To your specific questions:
I'd start with the basic step definition. Then try a multithreaded step. If that doesn't meet your needs, then move to local partitioning, and finally remote partitioning if needed. Keep in mind that Spring Batch was designed to make that progression as painless as possible. You can go from a regular step to a multithreaded step with only configuration updates. To go to partitioning, you need to add a single new class (a Partitioner implementation) and some configuration updates.
One final note. Most of this has talked about parallelizing the processing of this data. Spring Batch's FlatFileItemWriter is not thread safe. Your best bet would be to write to multiple files in parallel, then aggregate them afterwards if speed is your number one concern.
You should profile this in order to make a choice. In plain JDBC I would start with something that:
OutOfMemoryError
because your JDBC driver is buffering the entire data set in memory. By using those options you increase the chance that you get server side cursors and get the results "streamed" to you bit by bit, which is what you want for large result sets. Note that some JDBC drivers always "simulate" cursors in client side, so this tip might be useless for your particular DBMS.JdbcCursorItemReader seems to cover both things, but as it is said before they are not guaranteed to give you best performance in all DBMS, so I would start with that and then, if performance is inadequate, try JdbcPagingItemReader.
I don't think doing simple processing with JdbcCursorItemReader
will be slow for your data set size unless you have very strict performance requirements. If you really need to parallelize using JdbcPagingItemReader
might be easier, but the interface of those two is very similar, so I would not count on it.
Anyway, profile.