Access File through multiple threads

后端 未结 10 781
天涯浪人
天涯浪人 2021-01-31 10:52

I want to access a large file (file size may vary from 30 MB to 1 GB) through 10 threads and then process each line in the file and write them to another file through 10 threads

10条回答
  •  孤独总比滥情好
    2021-01-31 11:39

    • You should abstract from the file reading. Create a class that reads the file and dispatches the content to a various number of threads.

    The class shouldn't dispatch strings, it should wrap them in a Line class that contains meta information, e. g. The line number, since you want to keep the original sequence.

    • You need a processing class, that does the actual work on the collected data. In your case there is no work to do. The class just stores the information, you can extend it someday to do additional stuff (E.g. reverse the string. Append some other strings, ...)

    • Then you need a merger class, that does some kind of multiway merge sort on the processing threads and collects all the references to the Line instances in sequence.

    The merger class could also write the data back to a file, but to keep the code clean...

    • I'd recommend to create a output class, that again abstracts from all the file handling and stuff.

    Of course you need much memory for this approach, if you are short on main memory. You'd need a stream based approach that kind of works inplace to keep the memory overhead small.


    UPDATE Stream-based approach

    Everthing stays the same except:

    The Reader thread pumps the read data into a Balloon. This balloon has a certain number of Line instances it can hold (The bigger the number, the more main memory you consume).

    The processing threads take Lines from the balloon, the reader pumps more lines into the balloon as it gets emptier.

    The merger class takes the lines from the processing threads as above and the writer writes the data back to a file.

    Maybe you should use FileChannel in the I/O threads, since it's more suited for reading big files and probably consumes less memory while handling the file (but that's just an estimated guess).

提交回复
热议问题