一、用到的一些命令
taskset:调整进程使用的cpu
perf top:用于观察系统和软件内性能开销最大的函数列表
top:监控查看cpu、内存等使用情况
strace:用到-f(子进程) 和-p(进程id)参数,用于跟踪进程执行情况
ps:-m(thread) -p(pid) -o用户指定显示的信息
二、定位流程
1、管理平面运行在cpu 0上,通过top命令查看cpu 0没有idle cpu资源:
2、查看函数消耗cpu资源数:
taskset -c 0 perf top -C 0
可以看到第libc库的函数一直被调用
3、查看进程执行时间,可以看到exec时间远远大于sleep时间,说明占cpu时间很长:
cat /proc/sched_debug
4、进一步查看进程以及进程下的线程执行时间和占用cpu资源,可以看到占用cpu很高:
5、跟踪进程执行情况,发现一直在死循环:
strace –fp 3686
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506021, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506032, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506043, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506054, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506065, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506076, ...}) = 0
write(19, "errno = 22\n", 11) = 11
msgget(0x9999, IPC_CREAT|0666) = 1966081
msgsnd(1966081, {0, "type=\"security\" time=\"2020-06-02"...}, 1296, IPC_NOWAIT) = -1 EINVAL (Invalid argument)
stat("/var/log/tos_alarmd.log", {st_mode=S_IFREG|0600, st_size=1506087, ...}) = 0
write(19, "errno = 22\n", 11) = 11
6、查看对应进程模块,找到在这里产出死循环:
就是因为struct mymsg buff.mtype = 0;导致。
7、内核中的mtype应该大于等于1:
设备卡,cpu利用率高的问题就是通过上面流程基本就定位出来,记录笔记。
来源:oschina
链接:https://my.oschina.net/u/4273264/blog/4306502