Algorithm to evenly distribute values into containers?

前端 未结 2 868
一生所求
一生所求 2021-01-04 11:03

Does anyone know a way to evenly distribute numbers into a set number of containers, making sure that the total values of the containers are as even as possible?

EDI

相关标签:
2条回答
  • 2021-01-04 11:41

    Interesting. This C program seems to give the expected result so far. It starts with sorting the data, then for n containers, immediately stores the n highest numbers in each one. (You can omit that step, actually.) Then, from largest remaining number to smallest, it finds the container where adding that number makes the smallest difference to the optimal average. Because this runs from high to low, each number is placed into the optimal container -- all other numbers are lower, so the difference for them would even be bigger.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <limits.h>
    
    int sort_numeric (const void *a, const void *b)
    {
        return *((int *)a) - *((int *)b);
    }
    
    int main (int argc, char **argv)
    {
        int list[] = { 10, 30, 503, 23, 1, 85, 355 };
        int i,j, nnumber, ncontainer, total, avgPerContainer, nextError, smallestError, containerForSmallest;
        int *containers, *errors;
    
        ncontainer = 3;
    
        nnumber = sizeof(list)/sizeof(list[0]);
    
        qsort (list, nnumber, sizeof(list[0]), sort_numeric);
    
        containers = (int *)malloc(ncontainer * sizeof(int));
        for (i=0; i<ncontainer; i++)
            containers[i] = 0;
    
        errors = (int *)malloc(ncontainer * sizeof(int));
        for (i=0; i<ncontainer; i++)
            errors[i] = 0;
    
    
        printf ("input:");
        for (i=0; i<nnumber; i++)
        {
            printf (" %d", list[i]);
        }
        printf ("\n");
    
    //  how much is to be in each container?
        total = 0;
        for (i=0; i<nnumber; i++)
            total += list[i];
    
    //  this will result in a fraction:
    //      avgPerContainer = total/ncontainer;
    //  so instead we'll use 'total' and *keeping in mind*
    //  that the number needs to be divided by ncontainer
        avgPerContainer = total;
    
        printf ("per container: ~%d\n", (2*avgPerContainer+ncontainer)/(2*ncontainer) );
    
    //  start by putting highest values into each container
        for (i=0; i<ncontainer; i++)
            containers[i] += list[nnumber-ncontainer+i];
    //  .. remove from list ..
        nnumber -= ncontainer;
    
    //  print current totals
        for (i=0; i<ncontainer; i++)
        {
            errors[i] = containers[i]*ncontainer - avgPerContainer;
            printf ("#%d: %d, error = %d/%d ~ %+d\n", i, containers[i], errors[i], ncontainer, (2*errors[i]+ncontainer)/(2*ncontainer) );
        }
    
        printf ("remaining:");
        for (i=0; i<nnumber; i++)
        {
            printf (" %d", list[i]);
        }
        printf ("\n");
    
    //  add the remainders
        for (i=nnumber-1; i>=0; i--)
        {
            smallestError = INT_MAX;
            containerForSmallest = 0;
            for (j=0; j<ncontainer; j++)
            {
                nextError = (containers[j] + list[i]) - avgPerContainer;
                if (nextError < smallestError)
                {
                    containerForSmallest = j;
                    smallestError = nextError;
                    printf ("error for %d, %d + %d, is %+d\n", j, containers[j], list[i], smallestError);
                }
            }
            printf ("we put %d into #%d\n", list[i], containerForSmallest);
            containers[containerForSmallest] += list[i];
        }
    
        for (i=0; i<ncontainer; i++)
        {
            printf ("#%d: %d, error = %d/%d ~ %+d\n", i, containers[i], containers[i]*ncontainer - avgPerContainer, ncontainer, (2*(containers[i]*ncontainer - avgPerContainer)+ncontainer)/(2*ncontainer) );
        }
    
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-04 12:00

    Do you have a large dataset, with much variance in the size of objects, and a cast iron requirement that you must find the very best solution? If so, this is not realistic.

    But the good news is that many problems that are NP-complete in theory, are quite easy in the real world! If your number of datapoints is relatively small, then you can probably do an intelligent (but still thorough) search and find the globally optimum solution.

    Also, if the variance in the values is quite small if you have a nicely behaved dataset, you might quickly stumble across a solution that fills all the containers exactly evenly. If so, then this is obviously the best possible answer. This could work well even on very large datasets. (I think that what you want here is a dataset with lots of small values that can be used to easily tidy things up at the end.).

    So, don't give up! First, sort your data and consider the data points from the largest to the smallest. At each stage, assign the next value to the container which is currently smallest. This probably won't give you the optimal solution in all cases, but it might be quite reasonable in practice.

    Sorting 1000, 200, 20, 1000, would give you 1000, 1000, 200, 20. This algorithm would then give you:

    1000        = 1000
    1000        = 1000
    200   +20   =  220
    

    This happens to be the optimal solution, but it won't always be the case.

    ====

    If you are willing and able to try more complex algorithms, look up the partition problem:

    Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "The Easiest Hard Problem".

    There is an optimization version of the partition problem, which is to partition the multiset S into two subsets S1, S2 such that the difference between the sum of elements in S1 and the sum of elements in S2 is minimized.

    0 讨论(0)
提交回复
热议问题