Greedy algorithm: Interval coloring

有些话、适合烂在心里 提交于 2019-12-23 11:56:16

问题


In interval scheduling, the algorithm is to pick the earliest finish time. But in interval colouring the former does not work. Is there an example or explanation on why picking earliest finish time won't work for interval colouring?

The interval colouring problem is: 
given
 a 
set 
of 
intervals,
 we 
want 
to 
colour all
 intervals
 so 
that 
intervals
 given
 the
 same
 colour
 do 
not 
intersect
 and 
the
goal
 is 
to 
try 
to
 minimize 
the 
number 
of 
colours 
used. This can be thought of as the interval partitioning problem (if it makes more sense)

The interval scheduling problem that i'm referring to is: If you go to a theme park and there are many shows, the start and finish time of each show is an interval, and you are the resource. You want to attend as many shows as possible.


回答1:


If you only need a counter example of greedy algorithm on coloring, @btilly provides one already.

I am trying to give reasons to make it more intuitive.

First, for scheduling problem, you can indeed prove greedy algorithm works. The idea is like this:

I CANNOT get better result if I'm NOT choosing the show having the earliest finish time, let's see.

If there is two intervals A, B, with A has earlier finish time, then B is either

  1. Start time later than A's finish time, then no conflict at all, why not both?
  2. Start time earlier than A's finish time, there is conflict, I can only choose A OR B, however, A ends earlier, it gives a higher chance to pick more shows afterwards, no?

For coloring problem, however, it is totally another category of problem.

You are forced to pick ALL intervals, while the answer of the problem is THE MAXIMUM # of CONFLICTED INTERVALS OF ALL TIME.

Try to think like this: For all time, there are MAXIMUM 5 exams happening at the same time, you have AT LEAST to use 5 classrooms (colors), right?

So we cannot find this with choosing earliest finish time, the time did not tell you anything.

It may help you to decide whether you PICK or not PICK this interval (like in scheduling problem), but cannot tell you the MINIMUM # of resources you need. They are just different category of problem.

EDITED:

After re-reading OP's question, here's more details as far as I know about the coloring problem.

Define depth be the maximum # of conflicts at all time. Logically we know depth is the lower bound, but we have to proof it is the upper bound as well (by contradiction).

Proof

The proof needs to SORT INTERVALS BY START TIME IN ASCENDING ORDER or FINISH TIME IN DESCENDING ORDER, as shown follow:

Assume the depth of the interval set is d, and the answer is greater than d. Let x be the first interval we process that is using resources d+1, as the processing order is sort by start time ascending, it means there is at least d intervals that start before x and has conflict with x, then the depth of the set is at least d+1, contradiction. So d = depth is also the upper bound of the answer, it is the optimal answer of interval coloring.

Note that if you sort by start time descending, or finish time ascending, then you cannot use the same reasoning.

Concepts / Goal

Now we know depth is the answer, we have to find it. Concept-wise, it DOES NOT matter if you find by using start time or finish time, ascending or descending, all options can give you the depth of the interval set.

Implementation Consideration

However, for implementation, if you have to find it in O(n lg n), you will have to make use of GREEDY METHOD + some DATA STRUCTURES, which probably need the intervals have some kind of ordering. But that's another story, it's for implementation, concept-wise, it does not matter, you only want to find the depth of the interval set.

TL;DR

For interval scheduling problem, the greedy method indeed itself is already the optimal strategy; while for interval coloring problem, greedy method only help to proof depth is the answer, and can be used in the implementation to find the depth (but not in the way as shown in @btilly's counter example)




回答2:


This is just a case of playing around with pictures until you find an example. The first picture I drew that showed the problem had the following partitioning:

A: (0, 2) (3, 7)
B: (1, 4) (5, 6)

As a picture that looks like this:

-- ----
 --- -

But looking for the earliest stop time rule produces the following coloring:

A: (0, 2) (5, 6)
B: (1, 4)
C: (3, 7)

Which is this partitioning:

--   -
 ---
   ----

So this greedy rule fails to be optimal on this example.




回答3:


Actually, the version of the algorithm you may see sorting the inputs with starting time may not work for the finish time. The key point here is that in the algorithm, when more than one color is available, the rule is to assign arbitrarily. The counter example above illustrates this point.

If you want to use the order of finish time instead, one change needs to be made to the algorithm. Let's assume for every existing color i, among all the interval colored by i, the largest finish time is F_i. Then when more than one color is available, the rule is to color the interval with the available color that has the largest F_i.

If you make this change, the algorithm works for order in finish time. I run simulation and also write done formal proof for this.

The intuition is simple, since the sorting is by finish time, the current interval you are assigning, you don't want it to take to much resource. Although there may be two color available, you want to use the one that is less "expensive", to save space for the next interval which may actually starts before the current one and ends after current one.

Let's use the example above, A: (0, 2) (5, 6) B: (1, 4) C: (3, 7) This does not work because when assigning (5,6), A is chosen, however, if following my version of the algorithm, we know the ending on A is 2, ending on B is 4, so you choose B, with the largest ending. This saves the spot for (3,7) which starts early than (5,6) but ends later.

We do not have this kind problem when ordering with starting time, because since the order is by starting time, you know after the current job, no further job will start earlier, so you don't need to save that resource.

Note, although this may be a way to make the order of finish time work, the running time is larger than the original algorithm with order of start time. since you will have to run through all color to find the one not only available but has largest ending.



来源:https://stackoverflow.com/questions/35421511/greedy-algorithm-interval-coloring

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!