Been looking around for a little while now and I\'m a bit confused on this issue. I want to be able to take an input stream and read it concurrently in segments. The segme
I don't think you can read an InputStream concurrently. That is why the contract defines read, reset, and mark - the idea is that the stream keeps track internally what has been read and what has not.
If you're reading a file, just open multiple streams. You could use the skip() method to move the marker ahead for other threads to avoid duplicate line processing. BufferedReader may help some too, as it offers reading line by line.
First of all, to read the file concurrently starting from different offsets you need random access to the file, this means reading a file from any position. Java allows this with RandomAccessFile in java.in or with SeekableByteChannel in java.nio:
Best Way to Write Bytes in the Middle of a File in Java
http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
I think for the speed reasons you will prefer java.nio. Java NIO FileChannel versus FileOutputstream performance / usefulness
Now you know how to read from any position but you need to do this concurrently. It's not possible with the same file access object because they hold the position in the file. Thus you need as many file access objects as threads. Since you are reading not writing that should be Ok.
Now you know how to read the same file concurrently from many different offsets.
But think about the performance. Despite the number of threads you have only ONE disk drive and random reads (many threads access the same file) performance is much-much slower then sequential reads (one thread reads one file). Even if it's raid 0 or 1 - does not matter. Sequential reading is always much faster. So in you case I would advise you to read the file in one thread and supply other threads with the data from that reading thread.
A good approach might instead be to have a single reader that reads chunks and then hands each chunk off to a worker thread from a thread pool. Given that these will be inserted into a database the inserts will be by far the slow parts compared to reading the input so a single thread should suffice for reading.
Below is an example that hands off processing of each line from System.in
to a worker thread. Performance of database inserts is much better if you perform a large number inserts within a single transaction so passing in a group of say 1000 lines would be better than passing in a single line as in the example.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
public static class Worker implements Runnable {
private final String line;
public Worker(String line) {
this.line = line;
}
@Override
public void run() {
// Process line here.
System.out.println("Processing line: " + line);
}
}
public static void main(String[] args) throws IOException {
// Create worker thread pool.
ExecutorService service = Executors.newFixedThreadPool(4);
BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
String line;
// Read each line and hand it off to a worker thread for processing.
while ((line = buffer.readLine()) != null) {
service.execute(new Worker(line));
}
}
}