Using threadpools/threading for reading large txt files?

后端 未结 3 609
遇见更好的自我
遇见更好的自我 2021-01-21 04:29

On a previous question of mine I posted:

I have to read several very large txt files and have to either use multiple threads or a single thread to do so depending on

3条回答
  •  盖世英雄少女心
    2021-01-21 04:53

    Ok, bear with me on this, because I need to explain a few things.

    First off, unless you have multiple disks or perhaps a single disk which is SSD, it's not recommended to use more than one thread to read from the disk. Many questions on this topic have been posted and the conclusion was the same: using multiple threads to read from a single mechanical disk will hurt performance instead of improving it.

    The above happens because the disk's mechanical head needs to keep seeking the next position to read. Using multiple threads means that when each thread gets a chance to run it will direct the head to a different section of the disk, thus making it bounce between disk areas inefficiently.

    The accepted solution for processing multiple files is to have a single producer (a reader thread) - multiple consumer (processing threads) system. The ideal mechanism is a thread pool in this case, with a thread acting as the producer and putting tasks in the pool queue for the workers to process.

    Something like this:

    int numFiles = 20;
    int threads = 4;
    
    ExecutorService exec = Executors.newFixedThreadPool(threads);
    
    for(int i = 0; i < numFiles; i++){
        String[] fileContents = // read current file;
        exec.submit(new ThreadTask(fileContents));
    }
    
    exec.shutdown();
    exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
    ...
    
    class ThreadTask implements Runnable {
    
       private String[] fileContents;
    
       public ThreadTask(String[] fileContents) {
            this.fileContents = fileContents;
       }
    
       public void run(){
          //processes txt file
       }
    }
    

提交回复
热议问题