Optimal bubble sorting algorithm for an array of arrays of numbers

前端 未结 3 1753
不知归路
不知归路 2021-02-01 03:59

Fix positive integers n and k.

Let A be an array of length n with A[i] an array of length k

相关标签:
3条回答
  • 2021-02-01 04:38

    I know it's rather tacky to answer one's own question, but I've just figured this out and it is closer to an answer than it is to part of the question. However, this is not a complete answer and will not get accepted, so please post thoughts if anyone can improve this.

    The minimum number of swaps, say m, for k=2 is bounded by:

    2 * (n choose 2) >= m >= (2n choose 2) / 3
    

    Why does this work?

    The upper bound comes doing a bubble sort on the first elements of the arrays, followed by a bubble sort on the second elements of the arrays. That part isn't so tricky.

    The lower bound is a bit tricky, but here's how I came to it. Let's count the number of passes, where a pass happens when a larger number moves from the left of a smaller number to the right of that number. This can happen in 1 swap of a and b, with a larger and in the array to the left of b. It can also take 2 swaps if a is moved to the array with b in one swap and then moves on in a later swap. To keep track of things correctly, count passes in halves in this case. To make counting easier, it also counts as a pass when two of the same number split up and then recombine.

    The array is fully sorted after (2n choose 2) passes, so the only question is how many passes can happen with one swap. Here's a simple example where a and c are swapped:

    ... [a,b] , [c,d] ... 
    ... [c,b] , [a,d] ... 
    

    Now let's count the maximum number of passes that can have happened:

    • Since a > c, we definitely get 1 full pass.
    • If a > b, then we get 1/2 pass because a must have been left of b at some point.
    • If a > d, then we get 1/2 pass because a will be right of d at some point.
    • If c < d, then we get 1/2 pass because d must have been left of c at some point.
    • If c < b, then we get 1/2 pass because b will be right of c at some point.

    Therefore the best you can do on a swap is to get 3 passes (1 full and 4 halves).

    Why is this not a complete answer?

    I have no idea if the lower bound is always attainable! I don't think it is, and, despite several failed attempts, I can't code up an algorithm that achieves it.

    0 讨论(0)
  • 2021-02-01 04:43

    Here is an intuitive algorithm I thought of. It gives a constructive proof of the optimal solution I think.

    Here is the algorithm :

    I tried it for n= 4 5 6 7 9 and it gave the same results as the one from badawi:

    The idea is the following:

    1: chose one extreme value that is not at his final place ( 1 or n to start)

    2: find the extreme value which is the closest to his final position ( marked with an arrow in my example below)

    3: If it's among the largest elment,

    then move it to the other side and shifht all smallest element of the pair to the left

    otherwise

    move it to the otherside and shift all the largest element of each pair to the right .

    Note: shifting is equivqlent to "bubbling" this value with the smalles (resp largest) element of each pair.

    4: go back to step 2, but if you chose one of the large take one of the small and vice versa.

    It's pretty intuitive and it seems to work:

    Example n=5:

    11 22 33 44 55 
    ^
    |
    12 23 34 45 51 (4 moves) // shifted all larger numbers to the left
              ^
              |
    52 13 24 43 51 (3 moves) // shifted all smaller numbers to the right
       ^
       |
    52 34 24 35 11 (3 moves) // shifted all larger numbers to the left
              ^
              |
    55 24 34 32 11 (3 moves) // smaller to the right
       ^
       |
    55 44  33 22 11 (2 moves) // larger to left
    

    Total 15 moves.

    second example n=7:

    11 22 33 44 55 66 77 // 6 moves
     ^
    12 23 34 45 56 67 71 //5 moves
                    ^
    72 13 24 35 46 56 71 //5 moves
       ^
    72 34 25 36 46 57 11 // 4 moves
                    ^
    77 24 35 26 36 45 11 //4 moves
       ^
    77 45 36 26 35 42 11 //1 move
           ^       
    77 65 34 26 35 42 11 //2 moves
             ^
    77 65 34 56 34 22 11 //2 moves
              ^
    77 66 54 53 34 22 11 //1 move
              ^
    77 66 54 45 33 22 11 //1 move
              ^
    77 66 55 44 33 22 11
    

    total: 31

    Don't hesitate to ask me questions if i'm not clear.

    It's pretty easy to do it by hand. You can try it yourself with 6 or 7 or write an algorithm.

    I tried it with 6 it gave 23. , with 7 it gave 31 and with 9 it gave 53 , it takes one minute to calculate it by hand without computing anything

    Why this solution is optimal :

    Each time you move one large element to the opposite side, you move all the smallest one of the pair to the left.

    So moving all the large element will not make you lose any move for the moving all smallest one.

    You always move you element in "the right direction"

    Moreover you for moving the extreme elements you make the minimum number of moves. (this is because the algorithm takes the extreme value closest to his last position that no move is lost)

    The reasonning is the same for small element.

    This algorithm gives you optimal moves since it doesn't make any unnecessary move.

    Hope I didn't make any mistake .

    It proves that badawi results were optimal as you expected.

    0 讨论(0)
  • 2021-02-01 04:57

    This is not an optimal answer, but i would like to share my attempt as someone may improve it. I did not thought about finding a formula to calculate the minimum number of swaps but rather on the optimal algorithm. The algorithm is based on k = 2.

    The basic idea is based on information gain. Let us assume that A = {[i,j] : 1<=i<=n, 1<=j<=n} represents a configuration. In each step, we have 4 * (n-1) possible swapping to move from one configuration to another configuration. For example if n = 2 (i.e. A = [ {2,2}, {1,1} ] ), then we have 4 possible swapping A[0][0] <-> A[1][0], A[0][0] <-> A[1][1], A[0][1] <-> A[1][0], and A[0][1] <-> A[1][1]. Thus, our objective is to select the swap that has high information gain when we need to move from one configuration to another configuration.

    The tricky part will be "how to calculate the information gain". In my solution (below), the information gain is based on the distance of a value from its correct position. Let me show you my code (written in C++) to understand what i am trying to say:

    const int n = 5;
    const int k = 2;
    
    int gain(int item, int from, int to)
    {
        if (to > from)
            return item - to;
        else
            return to - item ;
    }
    
    void swap(int &x, int &y)
    {
        int temp = x;
        x = y;
        y = temp;
    }
    
    void print_config (int A[][k])
    {
        cout << "[";
        for (int i=0; i<n; i++) {
            cout << " [";
            for (int j=0; j<k; j++) {
                cout << A[i][j] << ", ";
            }
            cout << "\b\b], ";
        }
        cout << "\b\b ]" << endl;
    }
    
    void compute (int A[][k], int G[][4])
    {
        for (int i=0; i<n-1; i++)
        {
            G[i][0] = gain(A[i][0], i+1, i+2) + gain(A[i+1][0], i+2, i+1);
            G[i][1] = gain(A[i][0], i+1, i+2) + gain(A[i+1][1], i+2, i+1);
            G[i][2] = gain(A[i][1], i+1, i+2) + gain(A[i+1][0], i+2, i+1);
            G[i][3] = gain(A[i][1], i+1, i+2) + gain(A[i+1][1], i+2, i+1);
        }
    }
    
    int main()
    {
        int A[n][k];
        int G[n-1][k*k];
    
        // construct initial configuration
        for (int i=0; i<n; i++)
            for (int j=0; j<k; j++)
                A[i][j] = n-i;
    
        print_config(A);
    
        int num_swaps = 0;
        int r, c;
        int max_gain;
    
        do {
            compute (A, G);
    
            // which swap has high info gain
            max_gain = -1;
            for (int i=0; i<n-1; i++)
                for (int j=0; j<k*k; j++)
                    if (G[i][j] > max_gain) {
                       r = i;
                       c = j;
                       max_gain = G[i][j];
                    }
    
            // Did we gain more information. If not terminate
            if (max_gain < 0) break;
    
            switch (c)
            {
                case 0: swap(A[r][0], A[r+1][0]); break;
                case 1: swap(A[r][0], A[r+1][1]); break;
                case 2: swap(A[r][1], A[r+1][0]); break;
                case 3: swap(A[r][1], A[r+1][1]); break;
            }
    
            print_config(A);
            num_swaps++;
    
        } while (1);
        cout << "Number of swaps is " << num_swaps << endl;
    }
    

    I ran the above code for cases n=1,2,... and 7. Here are the answers (number of swaps) respectively: 0, 2, 5, 10, 15, 23 (very close), and 31. I think that the function gain() does not work well when n is even. Can you confirm that by validating the number of swaps when n = 7. The lower bound of your equation is 31 so this is the optimal number of swaps when n = 7.

    I am printing here the output when n = 5 (since you are looking for a pattern):

    [ [5, 5],  [4, 4],  [3, 3],  [2, 2],  [1, 1] ]
    [ [4, 5],  [5, 4],  [3, 3],  [2, 2],  [1, 1] ]
    [ [4, 5],  [3, 4],  [5, 3],  [2, 2],  [1, 1] ]
    [ [4, 5],  [3, 4],  [2, 3],  [5, 2],  [1, 1] ]
    [ [4, 5],  [3, 4],  [2, 3],  [1, 2],  [5, 1] ]
    [ [4, 3],  [5, 4],  [2, 3],  [1, 2],  [5, 1] ]
    [ [4, 3],  [2, 4],  [5, 3],  [1, 2],  [5, 1] ]
    [ [4, 3],  [2, 4],  [1, 3],  [5, 2],  [5, 1] ]
    [ [4, 3],  [2, 4],  [1, 3],  [1, 2],  [5, 5] ]
    [ [4, 3],  [2, 1],  [4, 3],  [1, 2],  [5, 5] ]
    [ [1, 3],  [2, 4],  [4, 3],  [1, 2],  [5, 5] ]
    [ [1, 3],  [2, 4],  [1, 3],  [4, 2],  [5, 5] ]
    [ [1, 3],  [2, 1],  [4, 3],  [4, 2],  [5, 5] ]
    [ [1, 1],  [2, 3],  [4, 3],  [4, 2],  [5, 5] ]
    [ [1, 1],  [2, 3],  [2, 3],  [4, 4],  [5, 5] ]
    [ [1, 1],  [2, 2],  [3, 3],  [4, 4],  [5, 5] ]
    
    0 讨论(0)
提交回复
热议问题