I\'ve read in multiple places that Linux\'s default scheduler is hyperthreading aware on multi-core machines, meaning that if you have a machine with 2 real cor
I think it's time to summarize some knowledge from comments.
Linux scheduler is aware of HyperThreading -- information about it should be read from ACPI SRAT/SLIT tables, which are provided by BIOS/UEFI -- than Linux builds scheduler domains from that.
Domains have hierarchy -- i.e. on 2-CPU servers you will get three layers of domains: all-cpus, per-cpu-package, and per-cpu-core domain. You may check it from /proc/schedstat
:
$ awk '/^domain/ { print $1, $2; } /^cpu/ { print $1; }' /proc/schedstat
cpu0
domain0 0000,00001001 <-- all cpus from core 0
domain1 0000,00555555 <-- all cpus from package 0
domain2 0000,00ffffff <-- all cpus in the system
Part of CFS scheduler is load balancer -- the beast that should steal tasks from your busy core to another core. Here are its description from the Kernel documentation:
While doing that, it checks to see if the current domain has exhausted its rebalance interval. If so, it runs
load_balance()
on that domain. It then checks the parent sched_domain (if it exists), and the parent of the parent and so forth.Initially,
load_balance()
finds the busiest group in the current sched domain. If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in that group. If it manages to find such a runqueue, it locks both our initial CPU's runqueue and the newly found busiest one and starts moving tasks from it to our runqueue. The exact number of tasks amounts to an imbalance previously computed while iterating over this sched domain's groups.
From: https://www.kernel.org/doc/Documentation/scheduler/sched-domains.txt
You can monitor for activities of load balancer by comparing numbers in /proc/schedstat
. I wrote a script for doing that: schedstat.py
Counter alb_pushed
shows that load balancer was successfully moved out task:
Sun Apr 12 14:15:52 2015 cpu0 cpu1 ... cpu6 cpu7 cpu8 cpu9 cpu10 ...
.domain1.alb_count ... 1 1 1
.domain1.alb_pushed ... 1 1 1
.domain2.alb_count 1 ...
.domain2.alb_pushed 1 ...
However, logic of load balancer is complex, so it is hard to determine what reasons can stop it from doing its work well and how they are related with schedstat counters. Neither me nor @thatotherguy can reproduce your issue.
I see two possibilities for that behavior:
mpstat
and schedstat
data)