Job scheduling with minimization by parallel grouping

て烟熏妆下的殇ゞ 提交于 2020-07-07 16:11:11

问题


I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job.

I know I can sort topologically to handle dependency ordering. I’ve tried using graph coloring on the subgraphs containing each category of jobs, but I run into problems with inter-category dependency conflicts. More specifically, when I have to make a decision of which of two or more pairs of jobs to group. I can brute force this, and I can try random walks over the search space, but I’m hoping for something smarter. The former blows up exponentially in the worst case, the latter is not guaranteed to be optimal.

To put things into scale- there can be as many as a couple hundred thousand jobs to schedule at once, with maybe a couple hundred categories of jobs.

I’ve stumbled upon many optimizations such as creating a graph of dependencies, splitting into connected components, and solving each subproblem independently and merging. I also realize there’s a lower bound by either the number of colors to color each category, but not sure how to use that beyond an early exit condition.

Is there a better way to find an ordering or jobs to maximize this “grouping” of jobs of a category, in order to minimize the total number of serial jobs?


回答1:


Here is a CP Optimizer model which solves very quickly using the most recent 12.10 version (a couple of seconds). The model is quite natural using precedence constraints and a "state function" to model the batching constraints (no two tasks from different categories can execute concurrently).

DURATION = [
 11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924,
 5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587,
 7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692,
 7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640,
 14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763
]

CATEGORY = [
1, 5, 3, 2, 2, 2, 2, 5, 1, 3,
5, 3, 5, 4, 1, 4, 1, 2, 4, 3,
2, 2, 1, 1, 3, 5, 2, 4, 4, 2,
1, 3, 1, 5, 2, 2, 3, 4, 4, 3,
3, 1, 2, 1, 2, 1, 4, 3, 4, 2
]

PREC = [
  (0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22),
  (13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42)
]

DEADLINE = [ (15, 50756), (18, 57757), (19, 58797),
             (24, 74443), (28, 65605), (31, 55928), (49, 58012) ]

assert(len(CATEGORY) == len(DURATION))

# ===========================================================================

from docplex.cp.model import CpoModel

mdl = CpoModel()

TASKS = range(len(DURATION))

# Decision variables - interval variables with duration (length) and name
itv = [
  mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1))
  for j in TASKS
]

# Deadlines - constrain the end of the interval.
for j,d in DEADLINE :
    mdl.add(mdl.end_of(itv[j]) <= d)

# Precedences - use end_before_start
for b, a in PREC :
    mdl.add(mdl.end_before_start(itv[b], itv[a]))

# Batching.  This uses a "state function" which is an unknown function of
# time which needs to be decided by CP Optimizer.  We say that this function
# must take the value of the category of the interval during the interval
# (using always_equal meaning the state function is always equal to a value
# over the extent of the interval).  This means that only tasks of a particular
# category can execute at the same time.
r = mdl.state_function()
for j in TASKS :
    mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j]))

# Objective.  Minimize the latest task end.
makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS)
mdl.add(mdl.minimize(makespan))

# Solve it, making sure we get the absolute optimal (0 tolerance)
# and limiting the log a bit. 's' contains the solution.
s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse")

# Get the final makespan
sol_makespan = s.get_objective_values()[0]

# Print the solution by zone
# s[X] gets the value of unknown X in the solution s
# s[r] gets the value of the state function in the solution
# this is a list of triples (start, end, value) representing
# the full extent of the state function over the whole time line.
zones = s[r]

# Iterate over the zones, ignoring the first and last ones, which
# are the zones before the first and after the last task.
for (start, end, value) in zones[1:-1] :
    print("Category is {} in window [{},{})".format(value, start, end))
    for j in TASKS:
        (istart, iend, ilength) = s[itv[j]] # intervals are start/end/length
        if istart >= start and iend <= end:
            print("\t{} @ {} -- {} --> {}".format(
                  itv[j].get_name(), istart, ilength, iend))



回答2:


No sure if this is helpful, but instead of aiming for an algorithm, it is also possible to develop an optimization model and let a solver do the work.

A Mixed Integer Programming model can look like:

The idea is that we minimize the total makespan, or the finish time of the latest job. This will automatically try to group together jobs of the same category (to allow parallel processing).

I created some random data for 50 jobs and 5 categories. The data set includes some due dates and some precedence constraints.

----     28 SET j  jobs

job1 ,    job2 ,    job3 ,    job4 ,    job5 ,    job6 ,    job7 ,    job8 ,    job9 ,    job10,    job11,    job12
job13,    job14,    job15,    job16,    job17,    job18,    job19,    job20,    job21,    job22,    job23,    job24
job25,    job26,    job27,    job28,    job29,    job30,    job31,    job32,    job33,    job34,    job35,    job36
job37,    job38,    job39,    job40,    job41,    job42,    job43,    job44,    job45,    job46,    job47,    job48
job49,    job50


----     28 SET c  category

cat1,    cat2,    cat3,    cat4,    cat5


----     28 SET jc  job-category mapping

             cat1        cat2        cat3        cat4        cat5

job1          YES
job2                                                          YES
job3                                  YES
job4                      YES
job5                      YES
job6                      YES
job7                      YES
job8                                                          YES
job9          YES
job10                                 YES
job11                                                         YES
job12                                 YES
job13                                                         YES
job14                                             YES
job15         YES
job16                                             YES
job17         YES
job18                     YES
job19                                             YES
job20                                 YES
job21                     YES
job22                     YES
job23         YES
job24         YES
job25                                 YES
job26                                                         YES
job27                     YES
job28                                             YES
job29                                             YES
job30                     YES
job31         YES
job32                                 YES
job33         YES
job34                                                         YES
job35                     YES
job36                     YES
job37                                 YES
job38                                             YES
job39                                             YES
job40                                 YES
job41                                 YES
job42         YES
job43                     YES
job44         YES
job45                     YES
job46         YES
job47                                             YES
job48                                 YES
job49                                             YES
job50                     YES


----     28 PARAMETER length  job duration

job1  11.611,    job2  12.558,    job3  11.274,    job4   7.839,    job5   5.864,    job6   6.025,    job7  11.413
job8  10.453,    job9   5.315,    job10 12.924,    job11  5.728,    job12  6.757,    job13 10.256,    job14 12.502
job15  6.781,    job16  5.341,    job17 10.851,    job18 11.212,    job19  8.894,    job20  8.587,    job21  7.430
job22  7.464,    job23  6.305,    job24 14.334,    job25  8.799,    job26 12.834,    job27  8.000,    job28  6.255
job29 12.489,    job30  5.692,    job31  7.020,    job32  5.051,    job33  7.696,    job34  9.999,    job35  6.513
job36  6.742,    job37  8.306,    job38  8.169,    job39  8.221,    job40 14.640,    job41 14.936,    job42  8.699
job43  8.729,    job44 12.720,    job45  8.967,    job46 14.131,    job47  6.196,    job48 12.355,    job49  5.554
job50 10.763


----     28 SET before  dependencies

             job3        job9       job13       job21       job23       job27       job32       job41       job42

job1          YES
job3                      YES
job4                                  YES
job8                                                                      YES
job9                                              YES         YES
job12                                                         YES
job14                                                                                             YES
job21                                                                     YES
job26                                                                                                         YES
job31                                                                                 YES

    +       job43       job46       job48

job10                     YES         YES
job11         YES


----     28 PARAMETER due  some jobs have a due date

job16 50.756,    job19 57.757,    job20 58.797,    job25 74.443,    job29 65.605,    job32 55.928,    job50 58.012

The solution can look like:

This model (with this particular data set) solved in about 30 seconds (using Cplex). Of course it is noted that, in general, these models can be difficult to solve to optimality.




回答3:


for job scheduling I encourage you to have a look at CPOptimizer within CPLEX introduction to CPOptimizer

A basic jobshop model will look like

using CP;

int nbJobs = ...;
int nbMchs = ...;

range Jobs = 0..nbJobs-1;
range Mchs = 0..nbMchs-1; 
// Mchs is used both to index machines and operation position in job

tuple Operation {
  int mch; // Machine
  int pt;  // Processing time
};

Operation Ops[j in Jobs][m in Mchs] = ...;

dvar interval itvs[j in Jobs][o in Mchs] size Ops[j][o].pt;
dvar sequence mchs[m in Mchs] in all(j in Jobs, o in Mchs : Ops[j][o].mch == m) itvs[j][o];



minimize max(j in Jobs) endOf(itvs[j][nbMchs-1]);
subject to {
  forall (m in Mchs)
    noOverlap(mchs[m]);
  forall (j in Jobs, o in 0..nbMchs-2)
    endBeforeStart(itvs[j][o], itvs[j][o+1]);
}

as can be seen in the sched_jobshop example



来源:https://stackoverflow.com/questions/59722383/job-scheduling-with-minimization-by-parallel-grouping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!