定位性能问题常用方法：系统负载，CPU与进程

实验环境

OS：Ubuntu 18.04
CPU：2 Cores
Memory：4G

查看OS版本

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

查看CPU个数

$ cat /proc/cpuinfo | grep "model name"
model name      : Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
model name      : Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz

查看内存信息

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           3944         128        3309           0         506        3601
Swap:          2047

查看OS信息

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

查看系统信息

$ uname -a
Linux aaa_ubuntu 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

查看平均负载

显示的数值为最近1分钟，5分钟，15分钟内的变化，从中可以看出趋势。

$ uptime
 17:03:14 up 13 min,  1 user,  load average: 0.00, 0.02, 0.03

或者用 top 命令：
top
持续观察变化（或者使用 top 命令）：

$ watch -d uptime

平均负载越高，说明系统越繁忙，但系统繁忙不代表CPU繁忙。

说明

平均负载：单位时间内，系统处于可运行状态和不可中断状态的平均活跃进程数。
可运行状态：正在使用CPU或等待使用CPU（R状态，Running or Runnable）
不可中断状态：处于内核关键流程且不可被中断，最常见为等待硬件响应（D状态，Uninterruptible sleep）

理想状态下，系统多少CPU就同时运行多少进程，平均负载 <= CPU Core数（一般在平均负载到达70%时要留意）。（但我不清楚在开启了超线程之后，这个公式应该进行怎样的调整）。
当平均负载 > CPU Core数时，就会发生超载。

平均负载升高的原因

存在CPU密集型进程

特征：CPU使用率很高
在这里插入图片描述
上图显示系统负载接近3, 2个 cpu 使用率都为100%，其中有3个进程占用了大量 cpu

$ mpstat -P ALL 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

06:02:55 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:02:56 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:56 PM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:56 PM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

06:02:56 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:02:57 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:57 PM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:57 PM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

06:02:57 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:02:58 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:58 PM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:02:58 PM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

上图显示每个 cpu 的使用率，跟 top 显示的一致。

$ pidstat -u 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

06:03:21 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:03:22 PM  1000      2902   70.30    0.00    0.00   28.71   70.30     0  stress
06:03:22 PM  1000      2903   59.41    0.00    0.00   40.59   59.41     1  stress
06:03:22 PM  1000      2904   69.31    0.00    0.00   31.68   69.31     1  stress

06:03:22 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:03:23 PM  1000      2902   72.00    0.00    0.00   29.00   72.00     1  stress
06:03:23 PM  1000      2903   61.00    0.00    0.00   38.00   61.00     0  stress
06:03:23 PM  1000      2904   64.00    0.00    0.00   34.00   64.00     1  stress

06:03:23 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:03:24 PM  1000      2727    0.00    1.00    0.00    0.00    1.00     0  sshd
06:03:24 PM  1000      2902   55.00    0.00    0.00   45.00   55.00     1  stress
06:03:24 PM  1000      2903   79.00    0.00    0.00   21.00   79.00     0  stress
06:03:24 PM  1000      2904   65.00    0.00    0.00   36.00   65.00     1  stress
06:03:24 PM  1000      2911    0.00    1.00    0.00    1.00    1.00     0  pidstat

上图显示共有3个进程占用了 cpu，由于 cpu 只有2个，任一时刻最多只有2个进程处于 running 状态，其余可运行进程需要等待 cpu，因此 %wait 不为0。

存在 I/O 密集型进程

在这里插入图片描述
上图可知，系统负载超过6, 2个 cpu 的 us 使用率极低，sy 使用率不高，但是 iowait（wa）很高。
只有一个进程 stress-ng-hdd 占用了最高的 cpu，但也只有 23.6%

$ mpstat -P ALL 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

07:02:52 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:53 PM  all    0.00    0.00   19.05   76.19    0.00    4.76    0.00    0.00    0.00    0.00
07:02:53 PM    0    1.11    0.00    7.78   84.44    0.00    6.67    0.00    0.00    0.00    0.00
07:02:53 PM    1    0.00    0.00   30.86   64.20    0.00    3.70    0.00    0.00    0.00    1.23

07:02:53 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:54 PM  all    0.56    0.00   13.41   82.68    0.00    2.79    0.00    0.00    0.00    0.56
07:02:54 PM    0    0.00    0.00   17.71   82.29    0.00    0.00    0.00    0.00    0.00    0.00
07:02:54 PM    1    0.00    0.00    8.64   85.19    0.00    6.17    0.00    0.00    0.00    0.00

07:02:54 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:55 PM  all    0.00    0.00   16.48   72.16    0.00    1.70    0.00    0.00    0.00    9.66
07:02:55 PM    0    1.05    0.00   26.32   72.63    0.00    0.00    0.00    0.00    0.00    0.00
07:02:55 PM    1    0.00    0.00    3.75   72.50    0.00    2.50    0.00    0.00    0.00   21.25

上图证实大部分时候，cpu 处于 iowait 状态。
iowait：当系统出现大量 I/O 请求时， cpu(s) 处于空闲状态的占比
换言之，iowait 导致的系统负载升高，说明存在 I/O 密集型进程

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1      0 1649524  98996 2143636    0    0    62 22164  204  395 14  3 74  9  0
 1  3      0 2177632  98996 1615300    0    0     0 346208  847 1510  0 33 25 42  0
 1  3      0 1981540  98996 1811240    0    0     0 452688  652 2349  0 20  7 73  0
 1  1      0 1763932  98996 2029080    0    0     0 249856  737 3307  1  9 16 75  0
 1  2      0 1834548  98996 1958472    0    0     0 305152  853  800  0 27 13 60  0
 1  2      0 1981648  98996 1811244    0    0     0 424104 1469 2600  1 25  5 69  0
 1  1      0 1820168  98996 1972576    0    0     0 319488 1023 4004  0 10  7 84  0

上图中的 bo列表示每秒发送到块设备的块数。这里 bo 的数值很高，说明系统在不停对设备进行写入操作。

iowait 的一个误区：认为 iowait 高就代表系统有问题。
实际上，iowait 高，只不过是表示系统有大量 I/O，但是 cpu 相当空闲。

以下在相同的 I/O 压力下，加入了 cpu 密集型进程后，可以使得 iowait 归 0。
在这里插入图片描述
上图可知，系统负载超高，2个 cpu 占满，但 iowait (wa) 为0.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 7  4      0 2600296 100532 1190628    0    0    57 32733  310  618 15  5 69 11  0
 6  5      0 2280636 100540 1510400    0    0     0 297180 7457 13156 64 37  0  0  0
 8  0      0 1979456 100540 1811496    0    0     0 308292 8827 16074 63 37  0  0  0
 6  5      0 2487384 100540 1303244    0    0     0  4136 6161 18759 75 26  0  0  0
 6  1      0 2831760 100540 959420    0    0     0 200224 5428 11782 67 33  0  0  0
 7  0      0 2493116 100544 1297968    0    0     0 348316 4865 8087 68 33  0  0  0
 6  1      0 2129112 100556 1662036    0    0     0 336748 4807 7636 67 33  0  0  0
 6  5      0 1979488 100564 1811568    0    0     0 166660 6162 14022 65 35  0  0  0
10  0      0 2951312 100564 839928    0    0     0 81340 5637 15043 67 34  0  0  0
10  0      0 2653460 100592 1137528    0    0     0 288684 6314 11247 69 31  0  0  0
 7  1      0 2336184 100600 1454700    0    0     0 333828 3902 6083 71 30  0  0  0

上图可知，bo 列值很高，证明仍然在进行大量的写设备操作，us列值很高，证明有程序使用了大量 cpu。
在高频 I/O 时间段内，cpu 并不空闲，因此 iowait 为0

存在大量处于可运行状态的进程

特征：running 队列过长（或R状态进程过多），CPU使用率高，但每个进程的CPU使用率不高
在这里插入图片描述
上图可见系统负载 > 9，2个 cpu 占满，11个 running 进程，其中10个进程占用 cpu 的总和接近了 2个 cpu。

$ pidstat -u 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

06:22:17 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:22:18 PM  1000      2547   18.27    0.00    0.00   75.00   18.27     1  stress
06:22:18 PM  1000      2548   19.23    0.00    0.00   75.96   19.23     0  stress
06:22:18 PM  1000      2549   20.19    0.00    0.00   78.85   20.19     1  stress
06:22:18 PM  1000      2550   19.23    0.00    0.00   76.92   19.23     1  stress
06:22:18 PM  1000      2551   19.23    0.00    0.00   76.92   19.23     1  stress
06:22:18 PM  1000      2552   19.23    0.00    0.00   76.92   19.23     0  stress
06:22:18 PM  1000      2553   19.23    0.00    0.00   76.92   19.23     0  stress
06:22:18 PM  1000      2554   19.23    0.00    0.00   77.88   19.23     0  stress
06:22:18 PM  1000      2555   19.23    0.00    0.00   77.88   19.23     0  stress
06:22:18 PM  1000      2556   19.23    0.00    0.00   76.92   19.23     1  stress

06:22:18 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:22:19 PM  1000      2547   21.00    0.00    0.00   82.00   21.00     1  stress
06:22:19 PM  1000      2548   20.00    0.00    0.00   80.00   20.00     0  stress
06:22:19 PM  1000      2549   19.00    0.00    0.00   80.00   19.00     1  stress
06:22:19 PM  1000      2550   20.00    0.00    0.00   80.00   20.00     1  stress
06:22:19 PM  1000      2551   20.00    0.00    0.00   80.00   20.00     1  stress
06:22:19 PM  1000      2552   20.00    0.00    0.00   81.00   20.00     0  stress
06:22:19 PM  1000      2553   19.00    0.00    0.00   80.00   19.00     0  stress
06:22:19 PM  1000      2554   20.00    0.00    0.00   79.00   20.00     0  stress
06:22:19 PM  1000      2555   20.00    0.00    0.00   80.00   20.00     0  stress
06:22:19 PM  1000      2556   20.00    0.00    0.00   80.00   20.00     1  stress
06:22:19 PM  1000      2609    0.00    1.00    0.00    0.00    1.00     1  pidstat

%wait 高，表示进程等待CPU的时间长。（注意，pidstat 中的 %wait 跟 top命令中的 wait 不一样，top的 wait 指的是 iowait）
由此可知，这些进程大部分时候都在等待 cpu。

存在大量处于可运行状态的线程

当系统存在大量可运行线程时，就算可运行进程很少，也会造成 system load 升高。
在这里插入图片描述
上图可看出 load average 很高，当前只有一个进程处于 running 状态，cpu sy 比 cpu us 要高，这个 sysbench 的进程占用了最多的 cpu。

$ pidstat -u 1 -p 2806
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

05:16:32 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:16:33 PM  1000      2806   60.00  100.00    0.00    0.00  100.00     0  sysbench
05:16:34 PM  1000      2806   61.00  100.00    0.00    0.00  100.00     0  sysbench
05:16:35 PM  1000      2806   54.00  100.00    0.00    0.00  100.00     0  sysbench
05:16:36 PM  1000      2806   56.00  100.00    0.00    0.00  100.00     0  sysbench
05:16:37 PM  1000      2806   58.00  100.00    0.00    0.00  100.00     0  sysbench
05:16:38 PM  1000      2806   69.00  100.00    0.00    0.00  100.00     0  sysbench

上图每秒打印一次 sysbench 进程的 cpu 使用率，可看出确实占用了大量 cpu。

$ pidstat -ut 1 -p 2806
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

05:17:04 PM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:17:05 PM  1000      2806         -   64.00  100.00    0.00    0.00  100.00     0  sysbench
05:17:05 PM  1000         -      2806    0.00    0.00    0.00    0.00    0.00     0  |__sysbench
05:17:05 PM  1000         -      2807    8.00   14.00    0.00   49.00   22.00     1  |__sysbench
05:17:05 PM  1000         -      2808    3.00   19.00    0.00   51.00   22.00     1  |__sysbench
05:17:05 PM  1000         -      2809    7.00    9.00    0.00   64.00   16.00     0  |__sysbench
05:17:05 PM  1000         -      2810    9.00    8.00    0.00   63.00   17.00     0  |__sysbench
05:17:05 PM  1000         -      2811    6.00   11.00    0.00   68.00   17.00     1  |__sysbench
05:17:05 PM  1000         -      2812    8.00   16.00    0.00   42.00   24.00     1  |__sysbench
05:17:05 PM  1000         -      2813    8.00   10.00    0.00   63.00   18.00     0  |__sysbench
05:17:05 PM  1000         -      2814    3.00   20.00    0.00   56.00   23.00     1  |__sysbench
05:17:05 PM  1000         -      2815    5.00   11.00    0.00   67.00   16.00     0  |__sysbench
05:17:05 PM  1000         -      2816    8.00   10.00    0.00   65.00   18.00     0  |__sysbench

05:17:05 PM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:17:06 PM  1000      2806         -   58.00  100.00    0.00    0.00  100.00     0  sysbench
05:17:06 PM  1000         -      2806    0.00    0.00    0.00    0.00    0.00     0  |__sysbench
05:17:06 PM  1000         -      2807    6.00   12.00    0.00   57.00   18.00     1  |__sysbench
05:17:06 PM  1000         -      2808    4.00   15.00    0.00   61.00   19.00     1  |__sysbench
05:17:06 PM  1000         -      2809    6.00   14.00    0.00   53.00   20.00     0  |__sysbench
05:17:06 PM  1000         -      2810    6.00   14.00    0.00   60.00   20.00     0  |__sysbench
05:17:06 PM  1000         -      2811    6.00   13.00    0.00   65.00   19.00     1  |__sysbench
05:17:06 PM  1000         -      2812    4.00   14.00    0.00   58.00   18.00     1  |__sysbench
05:17:06 PM  1000         -      2813    5.00   14.00    0.00   57.00   19.00     0  |__sysbench
05:17:06 PM  1000         -      2814    6.00   13.00    0.00   66.00   19.00     1  |__sysbench
05:17:06 PM  1000         -      2815    6.00   15.00    0.00   63.00   21.00     0  |__sysbench
05:17:06 PM  1000         -      2816    9.00   11.00    0.00   55.00   20.00     0  |__sysbench

上图加上 -t 选项后，打印了 sysbench 内部线程的 cpu 使用率，可以看出 %wait 很高，这代表各个线程等待 cpu 的时间很长。

查看线程上下文切换

在这里插入图片描述
上图可知存在大量可运行的线程导致系统负载升高

$ mpstat -P ALL 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

06:42:13 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:42:14 PM  all   36.00    0.00   64.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:14 PM    0   31.68    0.00   68.32    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:14 PM    1   40.00    0.00   60.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

06:42:14 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:42:15 PM  all   34.00    0.00   66.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:15 PM    0   32.00    0.00   68.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:15 PM    1   36.00    0.00   64.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

06:42:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:42:16 PM  all   30.50    0.00   69.50    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:16 PM    0   31.00    0.00   69.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:42:16 PM    1   30.00    0.00   70.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

上图可知，大部分 cpu 用于 kernel （%sys 比 %usr 高很多）

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  0      0 3266356  96552 536796    0    0    36   393 3394 3068 38 13 47  1  0
 7  0      0 3266376  96552 536796    0    0     0     0 39782 943369 33 67  1  0  0
 8  0      0 3266376  96552 536796    0    0     0     0 38650 909726 30 70  0  0  0
10  0      0 3266376  96552 536796    0    0     0     0 38048 971882 31 69  0  0  0
 7  0      0 3266376  96552 536796    0    0     0     0 39544 962102 30 70  0  0  0
 8  0      0 3266376  96552 536796    0    0     0     0 39163 917229 28 72  0  0  0
 8  0      0 3266376  96552 536796    0    0     0     0 39437 1003420 30 70  0  0  0
10  0      0 3266376  96552 536796    0    0     0     0 42390 939651 27 73  0  0  0
 8  0      0 3266376  96552 536796    0    0     0     0 43100 977931 34 66  0  0  0
 9  0      0 3266376  96552 536796    0    0     0     0 40550 971362 36 64  0  0  0
 6  0      0 3266376  96552 536796    0    0     0     0 39410 947462 29 71  0  0  0
 8  0      0 3266376  96552 536796    0    0     0     0 40276 944977 28 72  0  0  0

上图可知，上下文切换次数超高（cs 对应列）, 中断数也很高（in 对应列）

$ pidstat -wt 1 -p 2938
Linux 4.15.0-112-generic (aaa_ubuntu)   08/15/2020      _x86_64_        (2 CPU)

05:17:44 PM   UID      TGID       TID   cswch/s nvcswch/s  Command
05:17:45 PM  1000      2938         -      0.00      0.00  sysbench
05:17:45 PM  1000         -      2806      0.00      0.00  |__sysbench
05:17:45 PM  1000         -      2807  10785.00  77586.00  |__sysbench
05:17:45 PM  1000         -      2808   8773.00  72765.00  |__sysbench
05:17:45 PM  1000         -      2809  10349.00  82741.00  |__sysbench
05:17:45 PM  1000         -      2810   9048.00  77654.00  |__sysbench
05:17:45 PM  1000         -      2811  11500.00  81969.00  |__sysbench
05:17:45 PM  1000         -      2812  10458.00  81610.00  |__sysbench
05:17:45 PM  1000         -      2813   6844.00  85061.00  |__sysbench
05:17:45 PM  1000         -      2814  11279.00  82209.00  |__sysbench
05:17:45 PM  1000         -      2815  11345.00  75471.00  |__sysbench
05:17:45 PM  1000         -      2816   9282.00  82881.00  |__sysbench

05:17:45 PM   UID      TGID       TID   cswch/s nvcswch/s  Command
05:17:46 PM  1000      2938         -      0.00      0.00  sysbench
05:17:46 PM  1000         -      2806      0.00      0.00  |__sysbench
05:17:46 PM  1000         -      2807   8328.00  82343.00  |__sysbench
05:17:46 PM  1000         -      2808   7356.00  84236.00  |__sysbench
05:17:46 PM  1000         -      2809   8468.00  77616.00  |__sysbench
05:17:46 PM  1000         -      2810  10228.00  82942.00  |__sysbench
05:17:46 PM  1000         -      2811  10503.00  82221.00  |__sysbench
05:17:46 PM  1000         -      2812  12488.00  75746.00  |__sysbench
05:17:46 PM  1000         -      2813  10551.00  84009.00  |__sysbench
05:17:46 PM  1000         -      2814  11311.00  81515.00  |__sysbench
05:17:46 PM  1000         -      2815   6739.00  74858.00  |__sysbench
05:17:46 PM  1000         -      2816   9154.00  84515.00  |__sysbench

上图打印的是 sysbench 进程内线程的上下文切换情况。
cswch：自愿上下文切换次数，当所需资源不可用时就会出现。
nvcswch：非自愿上下文切换次数，当任务所分派的 cpu 时间分片用完后，就会被要求强制放弃 cpu。（或者被更高优先级的抢占）
上图显示 nvcwsch 很大，说明同一时间，系统存在大量可运行的线程，但是 cpu 不足，导致频繁切换线程上下文。线程上下文切换也是需要消耗系统 cpu 的，所以 cpu sy 很高。
在这里插入图片描述
之前通过 vmstat 查看到中断数也很高，可以通过命令 watch -d cat /proc/interrupts 来确定是哪种硬中断导致。
上图可知，RES 重调度中断增长很快，RES 中断用于不停唤醒空闲 CPU 来处理任务（只在多 CPU 系统才存在），说明了上下文频繁切换（这里是线程上下文）。

查看短进程导致CPU升高

在这里插入图片描述
上图可知，cpu 使用率颇高，但是 top 列表中没有出现高 cpu 的进程。

$ mpstat -P ALL 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/16/2020      _x86_64_        (2 CPU)

04:35:37 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
04:35:38 PM  all   19.10    0.00    0.50    5.03    0.00    0.50    0.00    0.00    0.00   74.87
04:35:38 PM    0   36.63    0.00    0.00    2.97    0.00    0.99    0.00    0.00    0.00   59.41
04:35:38 PM    1    2.04    0.00    1.02    6.12    0.00    0.00    0.00    0.00    0.00   90.82

04:35:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
04:35:39 PM  all   34.18    0.00    1.02    4.08    0.00    0.00    0.00    0.00    0.00   60.71
04:35:39 PM    0   22.45    0.00    1.02    4.08    0.00    1.02    0.00    0.00    0.00   71.43
04:35:39 PM    1   45.36    0.00    1.03    4.12    0.00    0.00    0.00    0.00    0.00   49.48

04:35:39 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
04:35:40 PM  all   50.75    0.00    1.51    2.51    0.00    0.50    0.00    0.00    0.00   44.72
04:35:40 PM    0   10.20    0.00    1.02    5.10    0.00    0.00    0.00    0.00    0.00   83.67
04:35:40 PM    1   91.00    0.00    1.00    0.00    0.00    1.00    0.00    0.00    0.00    7.00

04:35:40 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
04:35:41 PM  all   51.53    0.00    1.02    3.57    0.00    0.00    0.00    0.00    0.00   43.88
04:35:41 PM    0   62.63    0.00    2.02    1.01    0.00    0.00    0.00    0.00    0.00   34.34
04:35:41 PM    1   40.00    0.00    1.00    6.00    0.00    0.00    0.00    0.00    0.00   53.00

04:35:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
04:35:42 PM  all   51.01    0.00    1.52    3.03    0.00    0.00    0.00    0.00    0.00   44.44
04:35:42 PM    0   67.33    0.00    1.98    1.98    0.00    0.00    0.00    0.00    0.00   28.71
04:35:42 PM    1   34.38    0.00    1.04    4.17    0.00    0.00    0.00    0.00    0.00   60.42

上图可知，2个 cpu 大部分时间确实是花费在用户态程序的执行上。

$ pidstat -u 1
Linux 4.15.0-112-generic (aaa_ubuntu)   08/16/2020      _x86_64_        (2 CPU)

04:38:20 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:21 PM     0       217    0.00    1.00    0.00    0.00    1.00     0  kworker/0:1H
04:38:21 PM     0       337    0.00    1.00    0.00    0.00    1.00     0  jbd2/sda2-8
04:38:21 PM     0       819    5.00    1.00    0.00    0.00    6.00     0  snapd
04:38:21 PM  1000      3821    1.00    0.00    0.00    0.00    1.00     0  pidstat
04:38:21 PM     0      3825    2.00    0.00    0.00    0.00    2.00     0  stress

04:38:21 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:22 PM     0       819    5.94    0.00    0.00    0.00    5.94     0  snapd
04:38:22 PM     0      1951    0.00    0.99    0.00    0.00    0.99     0  kworker/u4:0
04:38:22 PM     0      3829    1.98    0.00    0.00    0.00    1.98     0  stress

04:38:22 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:23 PM     0       819    7.00    0.00    0.00    0.00    7.00     0  snapd
04:38:23 PM  1000      2446    0.00    1.00    0.00    0.00    1.00     0  sshd

04:38:23 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:24 PM     0         8    0.00    1.00    0.00    0.00    1.00     0  rcu_sched
04:38:24 PM     0       215    0.00    1.00    0.00    0.00    1.00     1  kworker/1:1H
04:38:24 PM     0       819    5.00    1.00    0.00    0.00    6.00     0  snapd
04:38:24 PM     0      3833    2.00    0.00    0.00    0.00    2.00     1  stress

04:38:24 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:25 PM     0       819    6.00    0.00    0.00    0.00    6.00     0  snapd
04:38:25 PM  1000      2446    0.00    1.00    0.00    0.00    1.00     0  sshd
04:38:25 PM  1000      3821    0.00    1.00    0.00    1.00    1.00     0  pidstat
04:38:25 PM     0      3837    1.00    0.00    0.00    0.00    1.00     0  stress

04:38:25 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:38:26 PM     0       337    0.00    1.00    0.00    1.00    1.00     1  jbd2/sda2-8
04:38:26 PM     0       819    6.00    1.00    0.00    0.00    7.00     0  snapd
04:38:26 PM     0      3841    1.00    0.00    0.00    0.00    1.00     1  stress

上图没有发现使用大量 cpu 的进程。

这时候就需要怀疑一下是不是短进程在作怪。可以使用以下工具来查看。
这个脚本可以直接使用，不需要其他依赖或者安装。
perf-tools/execsnoop

# ./execsnoop -t
Tracing exec()s. Ctrl-C to end.
Instrumenting sys_execve
TIMEs               PID   PPID ARGS
2770.354823        7359   7355 gawk -v o=1 -v opt_name=0 -v name= -v opt_duration=0 [...]
2770.356679        7360   7358 cat -v trace_pipe
2771.141217        7362   7361   new_child_proc-7362  [001] ....  2771.141217: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2771.142194        7363   7362 /usr/bin/stress -c 1 -t 1
2772.141682        7366   7365   new_child_proc-7366  [001] ....  2772.141682: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2772.142493        7367   7366 /usr/bin/stress -c 1 -t 1
2773.143181        7370   7369   new_child_proc-7370  [001] ....  2773.143181: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2773.144564        7371   7370 /usr/bin/stress -c 1 -t 1
2774.143527        7374   7373   new_child_proc-7374  [001] ....  2774.143527: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2774.144244        7375   7374 /usr/bin/stress -c 1 -t 1
2775.143685        7378   7377   new_child_proc-7378  [000] ....  2775.143685: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2775.144391        7379   7378 /usr/bin/stress -c 1 -t 1
2777.147780        7382   7381   new_child_proc-7382  [001] ....  2777.147780: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2777.148508        7383   7382 /usr/bin/stress -c 1 -t 1
2778.148406        7386   7385   new_child_proc-7386  [001] ....  2778.148406: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2778.149217        7387   7386 /usr/bin/stress -c 1 -t 1
2779.148549        7390   7389   new_child_proc-7390  [001] ....  2779.148549: execsnoop_sys_execve: (SyS_execve+0x0/0x40)
2779.149486        7391   7390 /usr/bin/stress -c 1 -t 1

上图可知，大约每隔1s，就会产生一个新的进程，这个进程又调用了 system 方法来产生另外一个子进程
"/usr/bin/stress -c 1 -t 1"

$ ps aux | grep 7355
root      8210  0.0  0.0  13136  1028 pts/1    S+   16:57   0:00 grep --color=auto 7355
$ ps aux | grep 7359
root      8224  0.0  0.0  13136  1008 pts/1    S+   16:57   0:00 grep --color=auto 7359

但如果使用 ps 来查找这些进程的话，又会发现它们都不存在。很明显，这些都是短时进程。
但是，从上图可以看到这些短进程，都包含 new_child_proc 。
在这里插入图片描述上图通过 watch -d “pstree -p | grep new_child_proc” 可以看到这些短时进程，都是由父进程 pid=2502 产生的。

$ ps aux | grep 2502
root      2502  0.0  0.0   4376  1108 pts/0    S+   16:32   0:00 ./new_child_proc
root     10527  0.0  0.0  13136  1060 pts/1    S+   17:05   0:00 grep --color=auto 2502

上图可以确定这个进程确实存在。kill 了它后就正常了。

查看僵尸进程

在这里插入图片描述
上图可知，当前系统存在大量僵尸进程。

$ ps aux | grep Z+
root     10962  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
root     10965  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
root     10966  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
root     10967  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
root     10968  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
root     10969  0.0  0.0      0     0 pts/0    Z+   17:17   0:00 [new_zombie_proc] <defunct>
...

$ ps aux | grep Z+ | wc -l
685

Z+ 表示僵尸进程。
上图可知，僵尸进程数量很多。

$ pstree -p | grep new_zombie_proc | head -n 10
           |-sshd(942)-+-sshd(1979)---sshd(2180)---bash(2182)---sudo(2241)---bash(2243)---new_zombie_proc(10961)-+-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+
           |           |                                                                                         |-new_zombie_proc(+

上图可以看到僵尸进程的父进程为 pid=10961。

# strace -p 10961
strace: Process 10961 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f851507d790) = 12020
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=999992096}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12020, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f851507d790) = 12021
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=999992340}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12021, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f851507d790) = 12022
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=999957790}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12022, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted nanosleep ...>^Cstrace: Process 10961 detached
 <detached ...>

上图使用 strace 可以探查到 pid=10961 的进程使用的 syscall。

# strace -c -p 10961
strace: Process 10961 attached
^Cstrace: Process 10961 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 81.59    0.000195          49         4           clone
 13.39    0.000032           8         4           restart_syscall
  5.02    0.000012           3         4         4 nanosleep
------ ----------- ----------- --------- --------- ----------------
100.00    0.000239                    12         4 total

这里显示的是一段时间内 syscall 的统计。kill 掉 pid=10961 的进程后，所有僵尸进程都消失了。
产生僵尸进程是因为子进程退出后，父进程没有调用 wait(…)，导致子进程的信息还一直残留。当父进程被 kill 后，这些子进程会被当成孤立进程，然后挂接到 init 进程下，由 init 进程进行收割。

来源：oschina

链接：https://my.oschina.net/u/4343285/blog/4514181

标签

sysbench