Python given numpy array of weights, find indices which split array so that sum of each split is less than value

问题

I have an 1D array of weights, w and an array of capacities c of the same shape as w. I need to find the smallest array of indices such that when w is split by these indices, the cumsums of split arrays less than the corresponding capacities in c. Given an array of weights and capacities as follows:

w = [1,2,3,4,5,6]; c = [3, 12, 7, 6, 12]

I need to find the smallest number of indices 'i' so that the cumsums of split arrays less than the corresponding capacities in c. In this case,

i = [2, 3, 5]

The cumsums of split arrays of w formed by i are

[1, 1+2, 1+2+3, 4, 5, 5+6]

each element is clearly less than c. The cum sums are calculated as given here.

An approximation of the required indices is also fine. But the cumsums should be strictly less than corresponding elements in c.

w is a very large array (size 100000 elements). I need a vectorized solution for it to be efficient. As said before, approximations are fine as long as the cumsums are less than c

Here is what I've tried. I've assumed that the entire c matrix is just one element repeated multiple times (I'm trying to solve a simpler case first and then add complexities). In this case, I just have to ensure that each split array has to have sum less than a given value (somewhat similar to bin packing). I find the indices as follows.

weights = np.random.random_integers(1, 20, size=(20))
capacity = 100

# Find cumulative sums and divide by capacity. This gives an approximation of indices. All elements in first
# split array would have values between 0 and 1. Those in second array would have elements between 1 and 2,
# and so on. When ever the integer part changes, a new split array would be formed. Find indices from this.
# After taking the ceiling value of all elements, elements between 0 and 1 would become 1, elements between
# 1 and 2 become 2 and so on. The place where the elements change give the indices. Take diff to find the
# boundary (of change).
indices = np.diff(np.ceil(np.cumsum(weights[i]) / self.sleigh_capacity))
# 0s represent repeated elements, 1s represent values where values change. Find the indices
indices = np.where(indices != 0)[0] + 1

This gives me the indices. One thing to note is that this might give me wrong indices, because cumulative sums are calculated from the beginning. That is, cumsum of [1,2,3,2,3] is [1,2,6,8,9]. Now if my capacity is 5. dividing cumsum by 5 and taking ceil gives me [1, 1, 2, 2, 2] which would correspond to splitting indices of [1, 4]. But the actual splitting indices are [1, 3, 4]. I'm handling this by reducing the capacity. That is, if my actual capacity is 5, I'd take it as 4 and then do the above (The value 4 is gotten by pure guess. To be on the safer side I might decrease the capacity even further).

But I'm not able to extend this method to the case where the capacities are varying. That is, if I have a capacity array of shape (1,5) then I would have to use a different approach, as this approach wouldn't work.

回答1:

w = [1,2,3,1,6,6]; c = [1,3,5, 1, 6, 12]

The only solution to this is

i=[2,3,4,5]

The greedy solution (to my understanding is to take until you cannot take)

It starts off with a 2 to get the [1,2] < =[1, 1+2] in c. However, if the next split is at 4 (as the greedy solution leads to, you get into issues since nothing can satisfy the 1). We should have instead split it at 2 and 3.

I suggested using backtracking to look back when this happens, but the running time could spiral out of control. The limit with 100k seems to suggest a linear solution or nlogn solution at worst. I have ideas of how to do this with dynamic programming, but still figuring out some specifics. Will update hopefully, or discard answer after a while. :)

来源：https://stackoverflow.com/questions/34552304/python-given-numpy-array-of-weights-find-indices-which-split-array-so-that-sum

标签

python

arrays

numpy

split

indices