Python implementation of the mergeSort algorithm

浪尽此生 提交于 2020-02-25 13:13:05

问题


I came across the following implementation of the mergeSort algorithm:

def merge_sort(x):
    merge_sort2(x,0,len(x)-1)


def merge_sort2(x,first,last):
    if first < last:
        middle = (first + last) // 2
        merge_sort2(x,first,middle)
        merge_sort2(x,middle+1,last)
        merge(x,first,middle,last)


def merge(x,first,middle,last):
    L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)
    i=j=0
    for k in range(first,last+1):
        if L[i] <= R[j]:
            x[k] = L[i]
            i += 1
        else:
            x[k] = R[j]
            j += 1


x = [17, 87, 6, 22, 41, 3, 13, 54]
x_sorted = merge_sort(x)
print(x)

I get most of it. However, what I don't understand are the following four lines of the merge function:

 L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)

First of all: why does the slicing end with middle+1 ? Slicing an array in Python includes the last element, right? So, shouldn't it be sufficient to slice from first:middle ? So, what is the +1 there for? Secondly: Why do I have to append the huge number to the lists? Why doesn't it work without? It doesn't, I checked that. But I just don't know why.


回答1:


Q1: Slicing an array in Python includes the last element, right?

No, Like range function Python slicing doesn't include the last element.

> a=[1,2,3,4,5]
> a[1:4]
[2, 3, 4]

Q2: Regarding the below snippet.

 L = x[first:middle+1]
    R = x[middle+1:last+1]
    L.append(999999999)
    R.append(999999999)

Without appending those large numbers to the lists, your merge code could have been different something like below.

   # Copy data to temp arrays L[] and R[] 
    while i < len(L) and j < len(R):
        if L[i] <= R[j]:
            x[k] = L[i]
            i += 1
        else:
            x[k] = R[j]
            j += 1
    # Checking if any element was left 
    while i < len(L): 
        x[k] = L[i] 
        i+=1
        k+=1
    while j < len(R): 
        x[k] = R[j] 
        j+=1
        k+=1

As @Cedced_Bro pointed out in the comment section, those largest numbers are used to know that the end of one of the sides has been reached. If you observe the above code snippet, if we run out of numbers in one list we ideally get out of the for loop and inserts the remaining elements of other lists in the temp array if any.

Appending those large numbers is an intelligent way to avoid those two for loops. But it has some cost of unnecessary comparison of 999999999 with remaining elements in the other list.




回答2:


You don't really need the spaghetti-style nested function, simply recur would do, from https://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python

from heapq import merge

def merge_sort(m):
    if len(m) <= 1:
        return m

    middle = len(m) // 2
    left = m[:middle]
    right = m[middle:]

    left = merge_sort(left)
    right = merge_sort(right)
    return list(merge(left, right))

The indexing shouldn't have +1 since Python slices don't overlap if they are the same index, i.e.

>>> x = [1,2,3,4,5,6]
>>> middle = 4
>>> x[:middle]
[1, 2, 3, 4]
>>> x[middle:]
[5, 6]

Moreover the heapq implementation of merge would have been more optimal than what you can write =)



来源:https://stackoverflow.com/questions/59372242/python-implementation-of-the-mergesort-algorithm

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!