问题

Hello I would like you to tell me if the space complexity for those 2 mergesort algorithms is the same.

Algo 1:

def mergeSort(alist, l, r):
    if r - l >= 1:
        mid = l + (r - l)//2

        mergeSort(alist, l, mid)
        mergeSort(alist, mid+1, r)

        i = l
        j = mid+1
        k = 0
        temp_list = [None]*(r-l+1)
        while i < mid+1 and j < r+1:
            if alist[i] <= alist[j]:
                temp_list[k] = alist[i]
                i=i+1
            else:
                temp_list[k] = alist[j]
                j=j+1
            k=k+1

        while i < mid+1:
            temp_list[k] = alist[i]
            i=i+1
            k=k+1

        while j < r+1:
            temp_list[k] = alist[j]
            j=j+1
            k=k+1

        n = 0
        for index in range(l,r+1):
            alist[index] = temp_list[n]
            n += 1

Algo 2:

def mergeSort2(alist):
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort2(lefthalf)
        mergeSort2(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] <= righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1

Intuitively for me Algo2 has a worse space complexity since the copied lists lefthalf and righthalf get pushed into the stack with the mergeSort2 calling them.

Whereas Algo1 does not allocate extra space until the time to merge comes temp_list = [None]*(r-l+1) , so the execution stack has only the extra array for the mergeSort currently being executed.

Is this correct ?

回答1:

First, let's assume that we have perfect garbage collection and every list is deallocated immediately after it falls out of use.

With this assumption, the algorithms have the same big O space complexity.

Algorithm 2

Take a look at Algorithm 2 first and consider the following example: Imagine you're sorting a list of length 16.

[15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0]

You compute the first and the second half of the list:

[15,14,13,12,11,10,9,8]  [7,6,5,4,3,2,1,0]

Then you sort the first half, in particular you divide it into two new sublists:

[15,14,13,12]  [11,10,9,8]

And you do the same again:

[15,14]  [13,12]

And again:

[15]  [14]

Only then you begin to merge the lists.

What is the total length of lists allocated by the algorithm at that point?

It is 16 + 2*8 + 2*4 + 2*2 + 2*1. In general, it's N + 2N/2 + 2N/4 + 2N/8 + ... + 2. That's a simple geometric progression that sums to something around 3*N.

The algorithm also needs O(log(N)) space for the call stack, but that vanishes in the big O notation: O(N)

It is easy to see that this is the maximum of what the algorithm will use at any given point -- the length of allocated lists that will be used in the future (and cannot be deallocated because of that) will never exceed 3*N.

Algorithm 1

Consider the same example again. We're to sort the following list.

[15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0]

Imagine that we have already sorted its first and second half:

[8,9,10,11,12,13,14,15, 0,1,2,3,4,5,6,7]

Now, we need to allocate a temporary list of length N to perform the merge. So at that moment we actively use two lists of length N, that gives us 2*N = O(N).

Again, it is easy to see that we'll never use more memory: the task of sorting the halves of the list naturally cannot cost more than sorting the list itself.