1 Quartz 线程

Quartz 有两类线程，调度线程（红）和执行线程（绿），如图：

2 QuartzSchedulerThread 分析

QuartzSchedulerThread 为Quartz核心线程，负责任务的调度，读取用户设定的触发器并分配线程，使触发器能够按照预定时间执行。

2.1 执行步骤说明

循环判断调度是否应该停止，如果应该停止则清空资源结束调度
循环判断调度是否应该暂停，如果应该暂停则，则调用wait阻塞本线程，直到被外部唤醒
（被唤醒后）从线程池中查询可用的任务执行线程，若线程池中暂无可用线程，则阻塞本线程，直到获取至少一个可用线程
根据一定规则从任务存储区域（JobStore）中找出马上要执行的一批触发器
- 配置参数 idleWaitTime，默认为30000，即在当前时间后 30s 内即将被触发执行的触发器就会被取出
- 配置参数 misfireThreshold，默认为60000，即在当前时间的 60s 之前本应执行但尚未执行的触发器不被认为是延迟触发，也会取出。更早之前的由 MisfireHandlerThread 按 Misfire 策略处理。
- 配置参数 batchTriggerAcquisitionMaxCount，默认为1，即一次拉取的批次数量为【 Math.min(该值, 可用线程数) = 1】
  - 当该值大于1时，在集群环境下，需确保 acquireTriggersWithinLock 为true，避免多节点同时拉取到相同的Trigger造成重复执行
- 配置参数 batchTriggerAcquisitionFireAheadTimeWindow，默认0，即拉取时，只取相差时间为0ms的Trigger。调大该值，可以让两个执行时间距离较近的触发器同时被取出执行
  - 例如：有两个触发器分别是10:00:00和10:00:05执行，此时配置为：batchTriggerAcquisitionFireAheadTimeWindow = 5000，batchTriggerAcquisitionMaxCount = 2，且拥有足够的线程时,这两个触发器就有可能会在预定时间10:00:00被同时执行
如果在第4步没有发现需要执行的触发器，该线程将阻塞一段时间（23~30s 随机），随后返回第1步
取出的触发器将在内存中等待被执行，但在等待执行的过程中，外部环境也可能修改触发器从而影响触发的时间。所以线程会阻塞到预定将被触发的时间，若阻塞未被打断，则执行，若触发受到外部环境影响，将根据一定的条件判断是否重新取出一批触发器，如果是，则抛弃现有的触发器，回到第1步
执行到第6步结束后，程序根据触发器取出对应的任务(job)、记录触发器的触发信息，并调整触发器下一次触发的时间，使触发器在下一次触发时能被取出
执行所有取出的触发器
执行完所有的触发任务后，返回第1步，重新取出下一批触发器

2.2 源码分析：

@Override
public void run() {
    int acquiresFailed = 0;
    // 1. 循环判断调度是否应该停止，如果应该停止则清空资源结束调度
    while (!halted.get()) {
        try {
            synchronized (sigLock) {
                // 2. 循环判断调度是否应该暂停，如果应该暂停则，则调用wait阻塞本线程，直到被外部唤醒
                while (paused && !halted.get()) {
                    try {
                        // wait until togglePause(false) is called...
                        sigLock.wait(1000L);
                    } catch (InterruptedException ignore) {
                    }

                    // reset failure counter when paused, so that we don't
                    // wait again after unpausing
                    acquiresFailed = 0;
                }

                if (halted.get()) {
                    break;
                }
            }

            // 在前几次的循环中如果触发器的读取出现问题， 则可能是数据库重启一类的原因引发的故障
            if (acquiresFailed > 1) {
                try {
                    long delay = computeDelayForRepeatedErrors(qsRsrcs.getJobStore(), acquiresFailed);
                    Thread.sleep(delay);
                } catch (Exception ignore) {
                }
            }

            // 3. 从线程池中查询可用的任务执行线程，若线程池中暂无可用线程，则阻塞本线程，直到获取至少一个可用线程
            int availThreadCount = qsRsrcs.getThreadPool().blockForAvailableThreads();
            if(availThreadCount > 0) { // will always be true, due to semantics of blockForAvailableThreads...
                List<OperableTrigger> triggers;

                long now = System.currentTimeMillis();

                clearSignaledSchedulingChange();
                try {
                    // 4. 根据一定规则从任务存储区域（JobStore：可以是内存或数据库），中找出马上要执行的一批触发器
                    triggers = qsRsrcs.getJobStore().acquireNextTriggers(
                            now + idleWaitTime, Math.min(availThreadCount, qsRsrcs.getMaxBatchSize()), qsRsrcs.getBatchTimeWindow());
                    acquiresFailed = 0;
                    if (log.isDebugEnabled())
                        log.debug("batch acquisition of " + (triggers == null ? 0 : triggers.size()) + " triggers");
                } catch (JobPersistenceException jpe) {
                    if (acquiresFailed == 0) {
                        qs.notifySchedulerListenersError(
                                "An error occurred while scanning for the next triggers to fire.",
                                jpe);
                    }
                    if (acquiresFailed < Integer.MAX_VALUE)
                        acquiresFailed++;
                    continue;
                } catch (RuntimeException e) {
                    if (acquiresFailed == 0) {
                        getLog().error("quartzSchedulerThreadLoop: RuntimeException "
                                +e.getMessage(), e);
                    }
                    if (acquiresFailed < Integer.MAX_VALUE)
                        acquiresFailed++;
                    continue;
                }

                if (triggers != null && !triggers.isEmpty()) {

                    now = System.currentTimeMillis();
                    long triggerTime = triggers.get(0).getNextFireTime().getTime();
                    long timeUntilTrigger = triggerTime - now;

                    while(timeUntilTrigger > 2) {
                        // 6. 在该while循环体中，被取出的触发器会阻塞等待到预定时间被触发
                        //    这里用了阻塞，因为当外部环境对触发器做了调整或者新增时，会对线程进行唤醒
                        //    在阻塞被唤醒后，会有相关的逻辑判断是否应该重新取出触发器来执行
                        //    比如当前时间是10:00:00，在上述逻辑中已经取出了10:00:05需要执行的触发器
                        //    此时如果新增了一个10:00:03的触发器，则可能需要丢弃10:00:05的，再取出10:00:03的
                        synchronized (sigLock) {
                            if (halted.get()) {
                                break;
                            }
                            // 判断在此过程中是否有新增的并且触发时间更早的Trigger
                            // 但是此处有个权衡，为了一个新增的的Trigger而丢弃当前已获取的是否值得？
                            // 丢弃当前获取的Trigger并重新获取需要花费一定的时间，时间的长短与JobStore的实现有关。
                            // 所以此处做了主观判断，如果使用的是数据库存储，查询时间假定为70ms，内存存储假定为7ms
                            // 如果当前时间距已获得的第一个Trigger触发时间小于查询时间，则认为丢弃是不合算的。
                            if (!isCandidateNewTimeEarlierWithinReason(triggerTime, false)) {
                                try {
                                    // we could have blocked a long while
                                    // on 'synchronize', so we must recompute
                                    now = System.currentTimeMillis();
                                    timeUntilTrigger = triggerTime - now;
                                    if(timeUntilTrigger >= 1)
                                        sigLock.wait(timeUntilTrigger);
                                } catch (InterruptedException ignore) {
                                }
                            }
                        }
                        if(releaseIfScheduleChangedSignificantly(triggers, triggerTime)) {
                            break;
                        }
                        now = System.currentTimeMillis();
                        timeUntilTrigger = triggerTime - now;
                    }

                    // this happens if releaseIfScheduleChangedSignificantly decided to release triggers
                    if(triggers.isEmpty())
                        continue;

                    // set triggers to 'executing'
                    List<TriggerFiredResult> bndles = new ArrayList<TriggerFiredResult>();

                    boolean goAhead = true;
                    synchronized(sigLock) {
                        goAhead = !halted.get();
                    }
                    if(goAhead) {
                        try {
                            // 7. 执行到第6步结束后，程序根据触发器取出对应的任务(job)、记录触发器的触发信息，并调整触发器下一次触发的时间，使触发器在下一次触发时能被取出
                            // triggersFired方法主要有几个作用:
                            // a. 取出触发器对应应执行的任务
                            // b. 记录触发器的执行，修改触发器的状态，如果对应的任务是StatefulJob，则阻塞其他触发器
                            // c. 调整触发器下次执行的时间
                            List<TriggerFiredResult> res = qsRsrcs.getJobStore().triggersFired(triggers);
                            if(res != null)
                                bndles = res;
                        } catch (SchedulerException se) {
                            qs.notifySchedulerListenersError(
                                    "An error occurred while firing triggers '"
                                            + triggers + "'", se);
                            //QTZ-179 : a problem occurred interacting with the triggers from the db
                            //we release them and loop again
                            for (int i = 0; i < triggers.size(); i++) {
                                qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            }
                            continue;
                        }

                    }

                    // 8. 执行所有取出的触发器
                    for (int i = 0; i < bndles.size(); i++) {
                        TriggerFiredResult result =  bndles.get(i);
                        TriggerFiredBundle bndle =  result.getTriggerFiredBundle();
                        Exception exception = result.getException();

                        if (exception instanceof RuntimeException) {
                            getLog().error("RuntimeException while firing trigger " + triggers.get(i), exception);
                            qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            continue;
                        }

                        // it's possible to get 'null' if the triggers was paused,
                        // blocked, or other similar occurrences that prevent it being
                        // fired at this time...  or if the scheduler was shutdown (halted)
                        if (bndle == null) {
                            qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            continue;
                        }

                        JobRunShell shell = null;
                        try {
                            shell = qsRsrcs.getJobRunShellFactory().createJobRunShell(bndle);
                            shell.initialize(qs);
                        } catch (SchedulerException se) {
                            qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
                            continue;
                        }
                        if (qsRsrcs.getThreadPool().runInThread(shell) == false) {
                            // this case should never happen, as it is indicative of the
                            // scheduler being shutdown or a bug in the thread pool or
                            // a thread pool being used concurrently - which the docs
                            // say not to do...
                            getLog().error("ThreadPool.runInThread() return false!");
                            qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
                        }

                    }

                    // 9. 执行完所有的触发任务后，返回第1步，重新取出下一批触发器
                    continue; // while (!halted)
                }
            } else { // if(availThreadCount > 0)
                // should never happen, if threadPool.blockForAvailableThreads() follows contract
                continue; // while (!halted)
            }
            // 5. 如果在第4步没有发现需要执行的触发器，该线程将阻塞一段随机时间，随后返回第1步
            long now = System.currentTimeMillis();
            long waitTime = now + getRandomizedIdleWaitTime();
            long timeUntilContinue = waitTime - now;
            synchronized(sigLock) {
                try {
                    if(!halted.get()) {
                        // QTZ-336 A job might have been completed in the mean time and we might have
                        // missed the scheduled changed signal by not waiting for the notify() yet
                        // Check that before waiting for too long in case this very job needs to be
                        // scheduled very soon
                        if (!isScheduleChanged()) {
                            sigLock.wait(timeUntilContinue);
                        }
                    }
                } catch (InterruptedException ignore) {
                }
            }

        } catch(RuntimeException re) {
            getLog().error("Runtime error occurred in main trigger firing loop.", re);
        }
    } // while (!halted)

    // drop references to scheduler stuff to aid garbage collection...
    qs = null;
    qsRsrcs = null;
}

2.3 流程图

根据规则取出一批触发器的规则说明：

配置参数 idleWaitTime，默认为30000，即在当前时间后 30s 内即将被触发执行的触发器就会被取出
配置参数 misfireThreshold，默认为60000，即在当前时间的 60s 之前本应执行但尚未执行的触发器不被认为是延迟触发，也会取出。更早之前的由 MisfireHandlerThread 按 Misfire 策略处理。
配置参数 batchTriggerAcquisitionMaxCount，默认为1，即一次拉取的批次数量为【 Math.min(该值, 可用线程数) = 1】；当该值大于1时，在集群环境下，需确保 acquireTriggersWithinLock 为true，避免多节点同时拉取到相同的Trigger造成重复执行
配置参数 batchTriggerAcquisitionFireAheadTimeWindow，默认0，即拉取时，只取相差时间为0ms的Trigger。调大该值，可以让两个执行时间距离较近的触发器同时被取出执行。例如：有两个触发器分别是10:00:00和10:00:05执行，此时配置为：batchTriggerAcquisitionFireAheadTimeWindow = 5000，batchTriggerAcquisitionMaxCount = 2，且拥有足够的线程时,这两个触发器就有可能会在预定时间10:00:00被同时执行

来源：oschina

链接：https://my.oschina.net/pengranxiang/blog/4313732

标签

quartz