Sorting Data larger than the RAM size

后端 未结 3 2065
小鲜肉
小鲜肉 2021-01-31 22:25

This is a Google interview question: Given 2 machines, each having 64 GB RAM, containing all integers (8 byte), sort the entire 128 GB data. You may assume a small amount of add

3条回答
  •  余生分开走
    2021-01-31 23:10

    ChingPing proposes a O(n log n) sort for each subset, followed by a linear merge (by swapping the elements). The problem with Quicksort (and most of the n log n sorts, is that they require n memory. I'd recommend instead using a SmoothSort which uses constant memory, still runs in O(n log n).

    The worst case scenario is where you have something like:

    setA = [maxInt .. 1]
    setB = [0..minInt]
    

    where both sets are ordered in reverse, but then the merger is in the reverse order.

    The (IMO - more clear) explanation of ChingPing's solution is:

    Have a pointers 'pointerA', 'pointerB' initialized at the beginning of each array
    While setA's pointer is not at the end
      if (setA[pointerA] < setB[pointerB])
        then { pointerA++; }
        else { swap(setA[pointerA], setB[pointerB]); pointerB++; }
    

    The sets should both now be sorted.

提交回复
热议问题