问题
There are a lot of examples which use FlatFileItemReader
along with TaskExecutor
. I provide samples below (both with XML and Java Config):
- Using Oracle Coherence with Spring Batch
- Spring Batch Multithreading Example
I have used it my self with XML configuration for large CSVs (GB size) writing to database with the out-of-the-box JpaItemWriter
. There seem to be no issues even without setting save-state = false or taking any kind of special handling.
Now, FlatFileItemReader
is documented as not thread-safe.
My guess was that JpaItemWriter
was "covering" the issue by persisting Sets i.e. collections with no duplicates if the hashCode()
and equals()
were covering the business key of the Entity. However, even this way it is not enough to prevent duplicates due to non-thread safe reading and processing.
Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader
within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter
is used?
P.S: The example links that I give above, use FlatFileItemReader
with TaskExecutor without mentioning at all possible thread-safety issues...
回答1:
TL;DR It is safe to use a FlatFileItemReader
with a TaskExecutor
provided the Writer
is thread-safe. (Assuming that you are not concerned with restarting jobs, retrying steps, skipping, etc at the moment).
Update : There is now a JIRA that officially confirms that saveState
needs to be set to false
(i.e disable restartability) if one wants to use FlatFileItemReader
with a TaskExecutor
in a thread safe manner.
Let's first hear it from the horses mouth by seeing what the Spring documentation says about using multi-threaded steps with a TaskExecutor
.
Spring Batch provides some implementations of ItemWriter and ItemReader. Usually they say in the Javadocs if they are thread safe or not, or what you have to do to avoid problems in a concurrent environment. If there is no information in Javadocs, you can check the implementation to see if there is any state
Let's address your questions now :
Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?
The statement "Regardess of the writer" is incorrect. The Writer
you use must be thread-safe. The JpaItemWriter is thread-safe accroding to the Java docs and can safely be used with a FlatFileItemReader
that is not thread-safe. Explaining how JpaItemWriter
is thread-safe would make this answer long. I recommend that you post another question if you are interested in how specific writers handle thread-safety. (As mentioned by the Spring Batch docs as well)
P.S: The example links that I give above, use FlatFileItemReader with TaskExecutor without mentioning at all possible thread-safety issues..
If you take a look at the coherence example, you will see that they clearly modify the CoherenceBatchWriter.java
in Figure 6. They first make mapBatch
local variable so that multiple threads have their own copy of this Map
. Moreover, if you dig further into the Coherence API, you should find that the NamedCache
being returned would be thread safe.
The second link that you provide looks really dicey since the Writer
does not do anything to avoid race conditions. That example is indeed an incorrect use of a multi-threaded step.
来源:https://stackoverflow.com/questions/42270806/using-flatfileitemreader-with-a-taskexecutor-thread-safety