Merging K- Sorted Lists using Priority Queue

后端未结

关注

 7  1556

I have been asked in my algorithm class to make a K-way merge algorithm which is of O(nlogk) After searching i found it could be done via making a k length prio

相关标签:

7条回答

长发绾君心

2021-01-06 18:28

I wrote a series of articles about this some years ago when discussing sorting a large text file. The idea is that the items you put on the heap contain not just the value but also the list the value came from. Or, you could just put the list reference on the heap and have the comparison function compare against the first item in the particular list.

See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=676 and http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=677 for an explanation of the basic algorithm using a sequential list in place of a heap. See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=680 for an improved version that uses a heap.

As I said in my comment, you can also do the merge without a heap. See https://stackoverflow.com/a/18984961/56778 for a description.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2021-01-06 18:28
Paul Hankin's solution is right, but it's a little bit difficult to read, especially you wanna implement in c++ or java. My solution is similar to Paul's. If you write in c++ or java, you may need an extra data structure to store the value of the element, index of the element in k-th array and index of the array in lists.
```
Element{
    int value;
    int idInArray,
    int idInList
}
```
But in python, I just store in a tuple (value, idInArray, idInList)
```
def mergeKArray(*lists):
    # implemented by min heap
    h = []
    r = []
    for k, arr in enumerate(lists):
        heapq.heappush(h, (arr[0], 0, k))
    while h:
        # min is the minimum element
        # i is the index of the min in the k-th array
        # k is the index of array in the list
        min, i, k = heapq.heappop(h)
        r.append(min)
        if i < len(lists[k]) - 1:
            i += 1
            heapq.heappush(h, (lists[k][i], i, k))
    return r
```
Because I just need to maintain a min heap with k elements, the time of eject or inject heap is O(log k). I also need to scan all n elements, and each element cost 2*log(k) time to inject and eject. Therefore, the big-O is O(n*log k).
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2021-01-06 18:29
Instead of simply storing the first element of each list in the priority queue, wrap it in a structure like this;
```
struct wrapper
{
    int list_number;
    int element;
}
```
Then, when you are pushing an element onto the priority queue, just add the list number form where it came. This way, when the minimum element gets popped, you will know from which list you should push the next element to push on the queue by examining popped_element.list_number.

In order to find if your list is empty you should add a function empty to it that returns true if the list does not have any more elements and false otherwise. The function would be very easy to implement. Just check if the size is zero then the list is empty otherwise it has one or more elements.

From your question I assume that a binary heap is used to implement the priority queue. The insertion operation in a binary heap takes O(lg k) time and the extract-min operation also takes O(lg k) times where k is the the size of the heap (number of lists in your case). Now if the total number of elements you have is n, the total time to process all of them will be O(n lg k).
0 讨论(0)
发布评论:

提交评论
- 加载中...
攒了一身酷

2021-01-06 18:34

You should not here that "n" the total number of nodes over all the list not just one list. In this case the solution is O(n logk). If we mean that we have n node on an average in every list(total of k lists) then it will be O(nk logk)

Here is an in depth explanation with some code

0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2021-01-06 18:35

here is my code using c++ stl

#include<iostream>
#include<vector>
#include<climits>
#include<queue>
#define ROWS 4
#define COLS 8
using namespace std;

struct node
{
    int ele;
    int arr_no;
    int next_index; 
};

void printVector(vector<int> v)
{
    for(unsigned int i=0;i<v.size();i++)
        cout<<v[i]<<" ";
}

// THIS IS THE BASIS ON WHICH THE ELEMENTS OF A PRIORITY QUEUE ARE SORTED AND 
// KEPT IN THE QUEUE, HERE THE CRITERIA IS THAT THE NODE WITH SMALLER ELEMENT SHOULD
// COME ABOVE THE ONE WITH LARGER ELEMENT

class compare
{
    public:
        bool operator()(node& n1, node& n2)
        {
           if (n1.ele > n2.ele) 
                return true;
           else
                return false;
        }
};

vector<int> mergeKArrays(vector< vector<int> > v)
{
    int k = v.size();       // NUMBER OF LISTS
    int n = v.at(0).size(); //SIZE OF EACH LIST

    vector<int> result;
    //result.resize( n*k );

    priority_queue<node, vector<node>, compare> minHeap;
    for (int i = 0; i < k; i++)
    {
        node temp;
        temp.ele = v[i][0]; //STORE THE FIRST ELEMENT
        temp.arr_no = i;        //INDEX OF ARRAY
        temp.next_index = 1;    //INDEX OF NEXT ELEMENT TO BE STORED FROM ARRAY
        minHeap.push(temp);
    }

    // NOW ONE BY ONE GET THE MINIMUM ELEMENT FROM MIN
    // HEAP AND REPLACE IT WITH NEXT ELEMENT OF ITS ARRAY
    for (int count = 0; count < n*k; count++)
    {
        // GET THE MINIMUM ELEMENT AND STORE IT IN OUTPUT
        node min_ele_node = minHeap.top();
        minHeap.pop();      
        result.push_back(min_ele_node.ele);

        // FIND THE NEXT ELELEMENT THAT WILL REPLACE CURRENT
        // ROOT OF HEAP. THE NEXT ELEMENT BELONGS TO SAME
        // ARRAY AS THE CURRENT ROOT.
        node new_node;
        new_node.arr_no = min_ele_node.arr_no;
        if (min_ele_node.next_index < n)
        {
            new_node.ele = v.at(min_ele_node.arr_no)[min_ele_node.next_index];
            new_node.next_index = min_ele_node.next_index + 1;
        }
        // IF ROOT WAS THE LAST ELEMENT OF ITS ARRAY
        else 
        {
            new_node.ele =  INT_MAX; //INT_MAX IS FOR INFINITE
        }

        // REPLACE ROOT WITH NEXT ELEMENT OF ARRAY
        minHeap.push(new_node);
    }
    return result;
}


int main()
{
    int arr[ROWS][COLS] = 
                    { 
                        {10, 20, 30, 40, 50, 60, 71, 86},
                        {15, 25, 35, 45, 60, 69, 77, 78},
                        {27, 29, 37, 48, 50, 65, 75, 78},
                        {32, 33, 39, 50, 80, 133, 139, 150},
                    }; 

    vector< vector<int> > matrix ;

    for( int i=0 ; i < ROWS; i++)
    {
        vector<int> vec;
        for(int j=0; j < COLS; j++)
            vec.push_back(arr[i][j]);
        matrix.push_back(vec);
    }

    vector<int> result = mergeKArrays(matrix);
    printVector(result);
    return 0;
}

0 讨论(0)

温柔的废话

2021-01-06 18:38

Here's some Python 2 code that does the merging.

import heapq

def addtoheap(h, i, it):
    try:
        heapq.heappush(h, (next(it), i))
    except StopIteration:
        pass

def mergek(*lists):
    its = map(iter, lists)
    h = []
    for i, it in enumerate(its):
        addtoheap(h, i, it)
    while h:
        v, i = heapq.heappop(h)
        addtoheap(h, i, its[i])
        yield v

for x in mergek([1, 3, 5], [2, 4, 6], [7, 8, 9], [10]):
    print x

Why is it O(n log k)? Well for each value removed, there's a heap pop and possibly a heap push (both of which are O(log k)). Since we remove n elements, it's O(n log k).

0 讨论(0)

1 2 下一页