问题
In one of my use cases I need to fetch the data from multiple nodes. Each node maintains a range (partition) of data. The goal is to read the data as fast as possible. Constraints are, cardinality of a partition is not known before hand. Using work sharing approach, I could split the partitions into sub-partitions and fetch the data in parallel. One drawback with this approach is, it is possible that one thread could fetch lot of data and take more time while the other thread could finish faster. The other approach is to use work stealing where we can break the partitions into much smaller ranges and use ForkJoinPool. The drawback with this approach is, if the partition is sparse, we could make many round trips to the server to realize there is not data for a sub-partition.
The question I've is, if I want to use ForkJoinPool, where the tasks can do some I/O operations, how do I do that? From the documentation of the FJ pool and from the best practices I read so far, it appears like FJ pool is not good for blocking IO operations. If I want to use non-blocking IO, how can I do that?
来源:https://stackoverflow.com/questions/52450387/forkjoinpool-asynchronous-io