Using threadpools/threading for reading large txt files?

后端 未结 3 610
遇见更好的自我
遇见更好的自我 2021-01-21 04:29

On a previous question of mine I posted:

I have to read several very large txt files and have to either use multiple threads or a single thread to do so depending on

相关标签:
3条回答
  • 2021-01-21 04:53

    Ok, bear with me on this, because I need to explain a few things.

    First off, unless you have multiple disks or perhaps a single disk which is SSD, it's not recommended to use more than one thread to read from the disk. Many questions on this topic have been posted and the conclusion was the same: using multiple threads to read from a single mechanical disk will hurt performance instead of improving it.

    The above happens because the disk's mechanical head needs to keep seeking the next position to read. Using multiple threads means that when each thread gets a chance to run it will direct the head to a different section of the disk, thus making it bounce between disk areas inefficiently.

    The accepted solution for processing multiple files is to have a single producer (a reader thread) - multiple consumer (processing threads) system. The ideal mechanism is a thread pool in this case, with a thread acting as the producer and putting tasks in the pool queue for the workers to process.

    Something like this:

    int numFiles = 20;
    int threads = 4;
    
    ExecutorService exec = Executors.newFixedThreadPool(threads);
    
    for(int i = 0; i < numFiles; i++){
        String[] fileContents = // read current file;
        exec.submit(new ThreadTask(fileContents));
    }
    
    exec.shutdown();
    exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
    ...
    
    class ThreadTask implements Runnable {
    
       private String[] fileContents;
    
       public ThreadTask(String[] fileContents) {
            this.fileContents = fileContents;
       }
    
       public void run(){
          //processes txt file
       }
    }
    
    0 讨论(0)
  • 2021-01-21 04:53

    I would start by reading this tutorial on high level concurrency. I recommend reading the whole concurrency tutorial because it sounds like you are new to multithreading.

    0 讨论(0)
  • 2021-01-21 04:59

    So, the newFixedThreadPool() call will return an instance of ExecutorService. You can reference the JavaDoc, which is pretty comprehensive and contains a workable example. You will want to either submit or invokeAll a number of Callables implementing your file-processing tasks, giving you a number of Futures in return. Their get() methods will give you the result of the task execution upon completion (you have to write that part yourself :))

    0 讨论(0)
提交回复
热议问题