Merge Sort in R

心已入冬 提交于 2019-12-06 12:00:17

问题


I am self studying the book "Introduction to Algorithms" by Cormen et alli. In their book, they use pseudo-code which assumes that arrays are passed by pointer (by reference). This is different from R (where objects are passed by value), so I am having some difficulties trying to translate their pseudo-code as close as possible, especially when recursion is involved. Most of the time, I have to implement things a lot differently.

For example, with the Merge Sort algorithm, they define the Merge Function (which I think I have translated correctly) and the recursive MergeSort function (where direct translation to R does not work).

The merge function in pseudo-code is as follows where: A is an array and p, q, and r are indices into the array such that p < q < r. The procedure assumes that the subarrays A[p:q] and A[q+1:r] are in sorted order. It merges them to form a single sorted subarray that replaces the current subarray A[p:r]

Merge(A, p, q, r)
n1 = q - p + 1
n2 = r - q
let L[1...n1+1] and R[1...n2+1] be new arrays
for i = 1 to n1
    L[i] = A[p+i-1]
for j = 1 to n2
    R[j] = A[q+j]
L[n1+1] = infinite
R[n2+1] = infinite
i=1
j=1
for k = p to r
    if L[i] <= R[j]
        A[j] = L[i]
        i = i + 1
    else
        A[k] = R[j]
        j = j + 1

Which I've translated to R as:

Merge <- function(a, p, q, r){
  n1 <- q - p + 1
  n2 <- r - q
  L <- numeric(n1+1)
  R <- numeric(n2+1)
  for(i in 1:n1){
    L[i] <- a[p+i-1]
  }
  for(j in 1:n2){
    R[j] <- a[q+j]
  }
  L[n1+1] <- Inf
  R[n2+1] <- Inf
  i=1
  j=1
  for(k in p:r){
    if(L[i] <= R[j]){
      a[k] <- L[i]
      i <- i +1
    }else{
      a[k] <- R[j]
      j <- j+1
    }
  }
  a
}

And it seems to work fine.

Merge(c(1,3,5, 2,4,6), 1, 3, 6)
[1] 1 2 3 4 5 6

Now the MergeSort function is defined in pseudo-code as follows:

MergeSort(A, p, r)
if p < r
   q = (p+r)/2
   MergeSort(A, p, q)
   MergeSort(A, q+1, r)
   Merge(A, p, q, r)

This assumes that A is passed by reference and that every change is visible to every recursive call, which is not true in R.

So, given the Merge function defined above, how you would implement the MergeSort function in R to obtain the correct results? (if possible, and preferable, but not necessary, somewhat similar to the pseudo-code)


回答1:


Trying to do a literal translation of pseudocode that is written for a language that allows for pass-by-reference in a language that does not support it is a terrible idea. R's not meant to work on slices of an array within a function. That's just not an appropriate translation. The pseudocode is supposed to communicate the spirit of the algorithm which you then translate into the appropriate language. Here's one possible translation of the spirit of mergesort into R.

mmerge<-function(a,b) {
    r<-numeric(length(a)+length(b))
    ai<-1; bi<-1; j<-1;
    for(j in 1:length(r)) {
        if((ai<=length(a) && a[ai]<b[bi]) || bi>length(b)) {
            r[j] <- a[ai]
            ai <- ai+1
        } else {
            r[j] <- b[bi]
            bi <- bi+1          
        }
    }
    r
}
mmergesort<-function(A) {
    if(length(A)>1) {
        q <- ceiling(length(A)/2)
        a <- mmergesort(A[1:q])
        b <- mmergesort(A[(q+1):length(A)])
        mmerge(a,b)
    } else {
        A
    }
}

You can run it with

x<-c(18, 16, 8, 7, 6, 3, 11, 9, 15, 1)
mmergesort(x)

In this version thing is replaced via reference: all functions return new values. Additional, rather than passing in slide indexes, we simply subset vectors and pass them whole to the functions.

Of course the performance of this version is likely to suffer because of all the memory reallocations that occur at the intermediate steps. There's not much you can do about that in base R because of how the language was designed. If you like, you can write C/C++ code and call that via the foreign language interfaces.

If you want to leave your Merge as-is (and ignore the R-way to do things), then you could do...

MergeSort<-function(A, p, r) {
    if(p < r) {
        q <- floor((p+r)/2)
        A <- MergeSort(A, p, q)
        A <- MergeSort(A, q+1, r)
        Merge(A, p, q, r)
    } else {
        A
    }
}
x <- c(18, 16, 8, 7, 6, 3, 11, 9, 15, 1)
MergeSort(x, 1, length(x))

UPDATE:

Including benchmarking harness

m1<-function() {
    x<-sample(1000, 250);
    mmergesort(x)
}

m2<-function() {
    x<-sample(1000, 250);
    MergeSort(x, 1, length(x))
}

microbenchmark(m1(), m2())


来源:https://stackoverflow.com/questions/26080716/merge-sort-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!