Python implementation of the mergeSort algorithm

问题

I came across the following implementation of the mergeSort algorithm:

def merge_sort(x):
    merge_sort2(x,0,len(x)-1)


def merge_sort2(x,first,last):
    if first < last:
        middle = (first + last) // 2
        merge_sort2(x,first,middle)
        merge_sort2(x,middle+1,last)
        merge(x,first,middle,last)


def merge(x,first,middle,last):
    L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)
    i=j=0
    for k in range(first,last+1):
        if L[i] <= R[j]:
            x[k] = L[i]
            i += 1
        else:
            x[k] = R[j]
            j += 1


x = [17, 87, 6, 22, 41, 3, 13, 54]
x_sorted = merge_sort(x)
print(x)

I get most of it. However, what I don't understand are the following four lines of the merge function:

 L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)

First of all: why does the slicing end with middle+1 ? Slicing an array in Python includes the last element, right? So, shouldn't it be sufficient to slice from first:middle ? So, what is the +1 there for? Secondly: Why do I have to append the huge number to the lists? Why doesn't it work without? It doesn't, I checked that. But I just don't know why.

回答1:

Q1: Slicing an array in Python includes the last element, right?

No, Like range function Python slicing doesn't include the last element.

> a=[1,2,3,4,5]
> a[1:4]
[2, 3, 4]

Q2: Regarding the below snippet.

 L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)

Without appending those large numbers to the lists, your merge code could have been different something like below.

   # Copy data to temp arrays L[] and R[] 
    while i < len(L) and j < len(R):
        if L[i] <= R[j]:
            x[k] = L[i]
            i += 1
        else:
            x[k] = R[j]
            j += 1
    # Checking if any element was left 
    while i < len(L): 
        x[k] = L[i] 
        i+=1
        k+=1
    while j < len(R): 
        x[k] = R[j] 
        j+=1
        k+=1

As @Cedced_Bro pointed out in the comment section, those largest numbers are used to know that the end of one of the sides has been reached. If you observe the above code snippet, if we run out of numbers in one list we ideally get out of the for loop and inserts the remaining elements of other lists in the temp array if any.

Appending those large numbers is an intelligent way to avoid those two for loops. But it has some cost of unnecessary comparison of 999999999 with remaining elements in the other list.

回答2:

You don't really need the spaghetti-style nested function, simply recur would do, from https://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python

from heapq import merge

def merge_sort(m):
    if len(m) <= 1:
        return m

    middle = len(m) // 2
    left = m[:middle]
    right = m[middle:]

    left = merge_sort(left)
    right = merge_sort(right)
    return list(merge(left, right))

The indexing shouldn't have +1 since Python slices don't overlap if they are the same index, i.e.

>>> x = [1,2,3,4,5,6]
>>> middle = 4
>>> x[:middle]
[1, 2, 3, 4]
>>> x[middle:]
[5, 6]

Moreover the heapq implementation of merge would have been more optimal than what you can write =)

来源：https://stackoverflow.com/questions/59372242/python-implementation-of-the-mergesort-algorithm

标签

python

mergesort