I\'ll try to explain the problem in the math language.
Assume I have a set of items X = {x_1, x_2, ..., x_n}
. Each item of X
belongs to one of
If we put aside the condition to take one x from each S_i, this problem is equivalent to Maximum Weight Independent Set in an interval graph (that is, finding a maximum-weight set of pairwise not connected vertices in a graph where vertices represent intervals, and vertices are connected if the corresponding intervals overlap). This problem can be solved in polynomial time. The version here also has a color for each vertex, and the chosen vertices need to have all different colors. I am not sure how to solve this in polynomial time, but you can exploit the fact that there are not too many colors: make a dynamic programming table T[C, x], where C is a set of colors and x is the position of an endpoint of an interval. T[C, x] should contain the maximum weight you can get from |C| intervals with the colors in C that are to the left of x. You can then fill in the table from left to right. This should be feasible since there are only 2^5=32 color sets.
Let's think of x_i as if it's interval with integer endpoints. f1 returns true if 2 intervals do not intersect and false otherwise. f2 compares sum lengths of intervals in subsets.
If I understand correctly, this means we can assign a value(its length) to each x_i from X. There is then no need to evaluate f2 on each possible solution / subset.
It's very unlikely that the smallest 5 x_i form the best subset. Depending on the actual data, the best subset might be the 5 biggest intervals. So I'd suggest sorting X by value. General idea is to start with the highest x and try adding more x's (highest first) till you got 5 nonoverlapping. Most likely you will find the best subset, before even generating a fraction of all the possible subsets (depends on the specific problem of course). But in worst case this is not faster than your solution.
Without further qualifying the domains and the evaluation function, this problem can be easily shown to be NP-Complete by reducing SAT onto it (i.e. let S_1,...,S_5 be {true,false} and f2 = 1 if the formula is fullfiled and 0 if not). Hence in that case, even without taking f1 into account you are out of luck.
If you know more about the actual structure of f1 and f1, you might have more luck. Have a look at Constrait Satisfaction Problems, to find out what to look for in the structure of f1 and f2.
This problem is a variation of maximum weighted interval scheduling algorithm. The DP algorithm has polynomial complexity of O(N*log(N))
with O(N)
space for the naive problem, and O(2^G * N * logn(N))
complexity with O(2^G * N)
space for this variation problem, where G
, N
represent the total no of groups/subsets(5 here) & intervals respectively.
If x_i doesn't represent intervals, then the problem is in NP, which other solutions have proved.
First let me explain the dynamic programming solution for maximum weighted interval scheduling, and then solve the variation problem.
start(i)
, end(i)
, weight(i)
be starting, ending point, interval length of the interval i
respectively.1, 2, ... N
.next(i)
represent the next interval that doesn't overlap with interval i
.S(i)
to be the maximum weighted interval only considering jobs i, i+1, ... N
.S(1)
is the solution, that considers all jobs from 1,2,... N
and returns the maximum weighted interval.S(i)
recursively..
S(i) = weight(i) if(i==N) // last job
= max(weight(i)+S(next(i)), S(i+1)
Complexity of this solution is O(N*log(N) + N)
. N*log(N)
for finding next(i)
for all jobs, and N
for solving the subproblems. Space is O(N)
for saving subproblem solutions.
Now, lets solve variation of this problem.
start(i)
, end(i)
, weight(i)
, subset(i)
be starting, ending point, interval length, subset of the interval i
respectively.1, 2, ... N
.next(i)
represent the next interval that doesn't overlap with interval i
.S(i, pending)
to be the maximum weighted interval only considering jobs i, i+1, ... N
and pending
is a list of subsets from which we have to choose one interval each.S(1, {S_1,...S_5})
is the solution, that considers all jobs 1,...N
, chooses one interval for each of S_1,...S_5
and returns the maximum weighted interval.S(i)
recursively as follows..
S(i, pending) = 0 if(pending==empty_set) // possible combination
= -inf if(i==N && pending!={group(i)}) // incorrect combination
= S(i+1, pending) if(group(i) not element of pending)
= max(weight(i)+S(next(i), pending-group(i)),
S(i+1, pending)
Note that I may have missed some base cases.
Complexity of this algo is O(2^G * N * logn(N))
with O(2^G * N)
space. 2^G * N
represents the subproblem size.
As an estimate, for small values of G<=10
and high values of N>=100000
, this algo runs pretty quickly. For medium values of G>=20
, N<=10000
should be low as well for this algo to converge. And for high values of G>=40
, the algo doesn't converge.
I don't got the answer because you asked very abstract question but I will give you an idea.
Try think multiThreading. For instance you can create a thread pool with a limited number of threads. Then find a recursion solution and start new task for each loop when you are diving inside.
I am saying as you be able to split this problem to many small task as better your algorithm will be.
Think problematically not mathematically!
I have a solution that should be good if my understanding of your question is right: So i begin with what i understand
each Integer is actually an interval from I1 to I2 and a Set is a
combination of such intervals. A Set is correct if none of the intervals
are intersecting and Set1>Set2 if the sum of Intervals in S1> sum of Intervals in S2.
So what I would've done in this situation would be somthing on these lines.
While comparing the intervals to determine if they intersect, do this.
a) Sort the intervals in order of start points
b) compare the end point of first and start point of consecutive intervals to determine an overlap. Keep an integer named gap, and if start and end of 2 intervals do not overlap increment gap with their difference.
This will automatically get you the sum of intervals in the set by doing Endpoint(lastI)-Startpoint(firstI) - Gap.
=> If you need just the best, you can take one variable max and keep comparing sets as they come.
=> If you need top5 or something then follow below, otherwise skip.
As soon as you get the sum and the set is correct, add the sum to a "MinHeap" of 5 elements. The first 5 elements will go as it is. Basically you are keeping track of the top 5 elements. When a new set is less that the min of the heap "Do Nothing and ignore this set as it is less that the top 5 sets" , when the set is larger than the min(meaning it is in the top 5) replace the min and sift the element down, keeping the min of top 5 at top. This will always keep the top 5 elements in the heap.
Now that you have the top 5 elements, you can easily determine the best with 5 pops. :)
Note: If intervals are in random order it will get you into a O(n^2) solution , and each comparison would then again have 4 if statements to check for overlap positions. you can sort the intervals in O(nlogn) and then go through the list once to determine overlap,(nlogn +n = nlogn) while simultaneously getting the top 5 sets. This should improve your performance, and time.
.