in this web page:
http://web.eecs.utk.edu/~huangj/CS302S04/notes/external-sorting2.html
Merge the resulting runs together into successively
Imagine you have the numbers 1 - 9
9 7 2 6 3 4 8 5 1
And let's suppose that only 3 fit in memory at a time.
So you'd break them into chunks of 3 and sort each, storing each result in a separate file:
279
346
158
Now you'd open each of the three files as streams and read the first value from each:
2 3 1
Output the lowest value 1
, and get the next value from that stream, now you have:
2 3 5
Output the next lowest value 2
, and continue onwards until you've outputted the entire sorted list.
If you process two runs A
and B
into some larger run C
you can do this line-by-line generating progressively larger runs, but still only reading at most 2 lines at a time. Because the process is iterative and because you're working on streams of data rather than full cuts of data you don't need to worry about memory usage. On the other hand, disk access might make the whole process slow -- but it sure beats not being able to do the work in the first place.