This is a Google interview question: Given 2 machines, each having 64 GB RAM, containing all integers (8 byte), sort the entire 128 GB data. You may assume a small amount of add
There are already answers for the 2 machine case.
I'm assuming that the 128GB of data to be sorted is stored as a single file on a single hard drive (or any external device). No matter how many machines or hard drives are used, the time it takes to read the original 128GB file and write the sorted 128GB file remains the same. The only savings occurs during the internal ram based sorts to create chunks of sorted data. The time it takes to merge with n+1 hard drives to do a n-way merge into a single sorted 128GB file onto the remaining hard drive again remains the same, limited by the time it takes to write the 128GB sorted file onto that remaining hard drive.
For n machines, the data would be split up into 128GB/n chunks. Each of the machines could alternate reading sub-chunks, perhaps 64MB at a time, to reduce random access overhead, so that the "last" machine isn't waiting for all of the prior machines to read all of their chunks before it even starts.
For n machines (64GB ram each) and n+1 hard drives with n >= 4, a radix sort with O(n) time complexity could be used by each machine to create 32GB or smaller chunks on the n working hard drives at the same time, followed by a n-way merge onto the destination hard drive.
There's a point of diminishing returns limiting the benefit of larger n. Somewhere beyond n > 16, the internal merge throughput could become greater than disk I/O bandwidth. If the merge process is cpu bound rather than I/O bound, there's a trade off in cpu overhead for the time it take to create chunks in parallel versus the merge overhead greater than I/O time.
Each of the 64 GB can be sorted using a quicksort separately and then using the external memory keep pointers at the heads of both 64GB array, lets consider we want RAM1 and RAM2 in that order to have the entire data, keep incrementing pointer at RAM1 if its smaller then the pointer value at RAM2 else swap the value with RAM2 until the pointer reached end of RAM1.
take the same concept to sort all N RAMs. Take pairs of them and sort using above method. You are left with N/2 sorted RAMs. Use the same concept above recursively.
ChingPing proposes a O(n log n) sort for each subset, followed by a linear merge (by swapping the elements). The problem with Quicksort (and most of the n log n sorts, is that they require n memory. I'd recommend instead using a SmoothSort which uses constant memory, still runs in O(n log n).
The worst case scenario is where you have something like:
setA = [maxInt .. 1]
setB = [0..minInt]
where both sets are ordered in reverse, but then the merger is in the reverse order.
The (IMO - more clear) explanation of ChingPing's solution is:
Have a pointers 'pointerA', 'pointerB' initialized at the beginning of each array
While setA's pointer is not at the end
if (setA[pointerA] < setB[pointerB])
then { pointerA++; }
else { swap(setA[pointerA], setB[pointerB]); pointerB++; }
The sets should both now be sorted.