Trying to determine the Processor Queue Length (the number of processes that ready to run but currently aren\'t) on a linux machine. There is a WMI call in Windows for this metr
The metrics you seek exist in /proc/schedstat
.
The format of this file is described in sched-stats.txt in the kernel source. Specifically, the cpu
lines are what you want:
CPU statistics
--------------
cpu 1 2 3 4 5 6 7 8 9
First field is a sched_yield() statistic:
1) # of times sched_yield() was called
Next three are schedule() statistics:
2) This field is a legacy array expiration count field used in the O(1)
scheduler. We kept it for ABI compatibility, but it is always set to zero.
3) # of times schedule() was called
4) # of times schedule() left the processor idle
Next two are try_to_wake_up() statistics:
5) # of times try_to_wake_up() was called
6) # of times try_to_wake_up() was called to wake up the local cpu
Next three are statistics describing scheduling latency:
7) sum of all time spent running by tasks on this processor (in jiffies)
8) sum of all time spent waiting to run by tasks on this processor (in
jiffies)
9) # of timeslices run on this cpu
In particular, field 8. To find the run queue length, you would:
Unfortunately, I'm not aware of any utility to automate this process which is usually installed or even packaged in a Linux distribution. I've not used it, but the kernel documentation suggests http://eaglet.rain.com/rick/linux/schedstat/v12/latency.c, which unfortunately refers to a domain that is no longer resolvable. Fortunately, it's available on the wayback machine.
Why not sar
or vmstat
?
These tools report the number of currently runnable processes. Certainly if this number is greater than the number of CPUs, some of them must be waiting. However, processes can still be waiting even when the number of processes is less than the number of CPUs, for a variety of reasons:
Moreover, the number of runnable processes is only sampled at an instant in time. In many cases this number may fluctuate rapidly, and the contention may be occurring between the times the metric is being sampled.
These things mean the number of runnable processes minus the number of CPUs is not a reliable indicator of CPU contention.