Getting the lowest possible sum from numbers' difference

后端 未结 10 1280
滥情空心
滥情空心 2020-12-23 22:19

I have to find the lowest possible sum from numbers\' difference.

Let\'s say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, becaus

相关标签:
10条回答
  • 2020-12-23 22:29

    Taking the edit into account:

    Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).

    You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.

    0 讨论(0)
  • 2020-12-23 22:29

    Order the list, then do the difference calculation.

    EDIT: hi @hey

    You can solve the problem using dynamic programming.

    Say you have a list L of N integers, you must form k pairs (with 2*k <= N)

    Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)

    Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)

    Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l

    The algorithm goes as follows:

    1. best(1, L) = smallest(L)
    2. best(2, L) = smallest2(L)
    3. for i from 1 to k:

    loop

    compute min ( 
        stored_best(i-2) - smallest2( stored_remainder(i-2) ),
        stored_best(i-1) - smallest( stored_remainder(i-1) 
    ) and store as best(i)
    store the remainder as well for the chosen solution
    

    Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.

    (The switching work is done by smallest2)

    0 讨论(0)
  • 2020-12-23 22:29

    I think @marcog's approach can be simplified further.

    Take the basic approach that @jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.

    @marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).

    This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).

    0 讨论(0)
  • 2020-12-23 22:31

    Step 1: Calculate pair differences

    I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each adjacent pair of numbers. These differences are the "candidate" differences contributing to the minimal difference sum. Using the numbers from your example would lead to:

    Number Diff
    ====== ====
    1561
            11
    1572
             0
    1572
            37
    1609
            73
    1682
            49
    1731
             0
    1731
           310
    2041
    

    Save the differences into an array or table or some other data structure where you can maintain the differences and the two numbers that contributed to each difference. Call this the DiffTable. It should look something like:

    Index Diff Number1 Number2
    ===== ==== ======= =======
      1     11    1561    1572
      2      0    1572    1572
      3     37    1572    1609
      4     73    1609    1682
      5     49    1682    1731
      6      0    1731    1731
      7    310    1731    2041
    

    Step 2: Choose minimal Differences

    If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered indices: 1, 3, 5, 7. This is the correct answer. However, the problem states that a subset of pairs are chosen and this complicates the problem quite a bit. In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:

    • The sum of the differences is minimal
    • The numbers participating in any chosen difference are removed from the list.

    The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance of 1572 is left. Whenever a Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of numbers from a list containing eight numbers.

    About the only method I can think of to minimize the sum of the Diff above is to generate and test.

    The following pseudo code outlines a process to generate all 'legal' sets of index values for a DiffTable of arbitrary size where an arbitrary number of number pairs are chosen. One (or more) of the generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.

    /* Global Variables */
    M = 7    /* Number of candidate pair differences in DiffTable */
    N = 3    /* Number of indices in each candidate pair set (3 pairs of numbers) */
    AllSets = [] /* Set of candidate index sets (set of sets) */
    
    call GenIdxSet(1, []) /* Call generator with seed values */
    
    /* AllSets now contains candidate index sets to perform min sum tests on */
    
    end
    
    procedure: GenIdxSet(i, IdxSet)
      /* Generate all the valid index values for current level */
      /* and subsequent levels until a complete index set is generated */
      do while i <= M
         if CountMembers(IdxSet) = N - 1 then  /* Set is complete */
            AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
         else                                  /* Add another index */
           call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
         i = i + 1
         end
    return
    

    Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set where the arguments are appended into a single ordered set. For example AppendToSet([a, b, c], d) returns the set: [a, b, c, d].

    For the given parameters, M = 7 and N = 3, AllSets becomes:

    [[1 3 5]
     [1 3 6]  <= Diffs = (11 + 37 + 0) = 48
     [1 3 7]
     [1 4 6]
     [1 4 7]
     [1 5 7]
     [2 4 6]
     [2 4 7]
     [2 5 7]
     [3 5 7]]
    

    Calculate the sums using each set of indices, the one that is minimum identifies the required number pairs in DiffTable. Above I show that the second set of indices gives the minimum you are looking for.

    This is a simple brute force technique and it does not scale very well. If you had a list of 50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of number pairs to test.

    0 讨论(0)
  • 2020-12-23 22:32

    I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.

    In that case, the idea would be:

    • sort the numbers
    • emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.

    This works. Proof:

    Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then

    |x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.

    In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.

    0 讨论(0)
  • 2020-12-23 22:40

    The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [@marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]

    Proof

    Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.

    Claim: An optimal solution always consists of the differences of consecutive elements.
    Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
    xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
    (Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.

    The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.

    The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)

    Code

    Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.

    '''
    The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
    '''
    import sys
    def ints(): return [int(s) for s in sys.stdin.readline().split()]
    
    N, K = ints()
    num = sorted(ints())
    
    best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
    def b(k,n):
        if best.has_key((k,n)): return best[(k,n)]
        if k==0: return 0
        return float('inf')
    
    for n in range(1,N):
        for k in range(1,K+1):
            best[(k,n)] = min([b(k,n-1),                      #Not using num[n]
                               b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
    
    print best[(K,N-1)]
    

    Test it:

    Input
    4 2
    1515 1520 1500 1535
    Output
    30
    
    Input
    8 3
    1731 1572 2041 1561 1682 1572 1609 1731
    Output
    48
    
    0 讨论(0)
提交回复
热议问题