I want to access a large file (file size may vary from 30 MB to 1 GB) through 10 threads and then process each line in the file and write them to another file through 10 threads
The class shouldn't dispatch strings, it should wrap them in a Line
class that contains meta information, e. g. The line number, since you want to keep the original sequence.
You need a processing class, that does the actual work on the collected data. In your case there is no work to do. The class just stores the information, you can extend it someday to do additional stuff (E.g. reverse the string. Append some other strings, ...)
Then you need a merger class, that does some kind of multiway merge sort on the processing threads and collects all the references to the Line
instances in sequence.
The merger class could also write the data back to a file, but to keep the code clean...
Of course you need much memory for this approach, if you are short on main memory. You'd need a stream based approach that kind of works inplace to keep the memory overhead small.
UPDATE Stream-based approach
Everthing stays the same except:
The Reader
thread pumps the read data into a Balloon
. This balloon has a certain number of Line
instances it can hold (The bigger the number, the more main memory you consume).
The processing threads take Line
s from the balloon, the reader pumps more lines into the balloon as it gets emptier.
The merger class takes the lines from the processing threads as above and the writer writes the data back to a file.
Maybe you should use FileChannel
in the I/O threads, since it's more suited for reading big files and probably consumes less memory while handling the file (but that's just an estimated guess).