mysql 消耗CPU 异常高

1.这里看到的是主机cpu 90% 都给消耗掉了，主要是mysql 进程消耗资源
top - 14:46:26 up 266 days, 20:41, 4 users, load average: 17.14, 15.68, 10.69
Tasks: 264 total, 1 running, 263 sleeping, 0 stopped, 0 zombie
Cpu(s): 69.5%us, 21.2%sy, 0.0%ni, 9.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16333448k total, 9920660k used, 6412788k free, 488896k buffers
Swap: 2097148k total, 24448k used, 2072700k free, 3965104k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9386 mysql 20 0 14.6g 4.5g 9164 S 723.8 28.9 416:42.07 /db/mysql/app/mysql/bin/mysqld --defaults-file=/db/mysql/app/mysql/my.cnf --basedi
11846 root 20 0 15168 1380 940 R 0

2。使用mysql 巡检脚本，检查当前的processlist 列表，
主要等待事件是 Sending data 和 Waiting for table metadata lock 2个类型的等待事件。

Id User Host db Command Time State Info
473 resync 56.16.3.21:56636 NULL Binlog Dump GTID 52765 Master has sent all binlog to slave; waiting for binlog to be updated NULL
47414 db 10.200.210.61:36735 user Sleep 142 NULL
79608 db 56.16.3.26:38200 cbrc Query 2694 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
79748 db 10.200.210.66:33393 cbrc Query 2563 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
79928 db 10.200.210.66:33529 cbrc Query 2457 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80492 db 56.16.3.26:38395 cbrc Query 2211 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80562 db 56.16.3.26:38410 cbrc Query 2182 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80645 db 56.16.3.26:38425 cbrc Query 2151 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80658 db 56.16.3.26:38434 cbrc Query 2130 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80716 db 56.16.3.26:38449 cbrc Query 2095 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
80996 db 56.16.3.26:38496 cbrc Query 2070 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
81043 db 56.16.3.26:38501 cbrc Query 1895 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
81530 db 56.16.3.26:38621 cbrc Query 1508 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
81736 db 56.16.3.26:38685 cbrc Query 1463 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
82182 db 56.16.3.26:38757 cbrc Query 1448 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
82783 db 56.16.3.26:38901 cbrc Query 1156 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
82788 db 56.16.3.26:38902 cbrc Query 1116 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
82789 db 56.16.3.26:38903 cbrc Query 1088 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
82790 db 56.16.3.26:38904 cbrc Query 943 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83054 db 10.200.210.66:36161 cbrc Query 996 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83058 db 10.200.210.66:36162 cbrc Query 981 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83059 db 10.200.210.66:36163 cbrc Query 883 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83060 db 10.200.210.66:36164 cbrc Query 830 Sending data select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83404 db 56.16.3.26:39045 cbrc Query 391 Waiting for table metadata lock select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl
83552 db 10.200.210.66:36670 cbrc Query 271 Waiting for table metadata lock insert into data_collect_detl (detl_id, data_indc_id, data_collect_id, data_result, data_coll

3.对于2类等待事件。
3.1.1 Sending data -》说明这个SQL 正在运行，消耗CPU ,有可能需要优化SQL，现在可以考虑临时kill 掉所有运行这个SQL的session.
如果没有kill 这些select 的session,单独对表truncate 也会造成等待,
因为mysql 的tuncate 会等待所有慢长查询完成，才能进行，另外慢长的查询，执行delete 语句，不会等待慢长查询完成、

通过检查慢查询，发现如下SQL 比较消耗资源，后期考虑优化。
select c.*, i.*, (select max(dc.data_collect_mil) from data_collect_detl dc where dc.data_indc_id =i.data_indc_id and dc.rmk='success') as data_collect_mil, (select dc2.data_result from data_collect_detl dc2 where dc2.data_indc_id=i.data_indc_id and dc2.data_collect_mil=( select max(data_collect_mil) from data_collect_detl d where d.rmk='success' and d.data_indc_id=i.data_indc_id)) as data_result from data_collect_conf c left join data_indc_conf i on c.data_collect_id = i.data_collect_id order by i.data_collect_id,i.data_indc_order

--kill session 的方法，
select concat('KILL ',id,';') from information_schema.processlist where db='cbrc' into outfile '/tmp/a.txt'

3.2 Waiting for table metadata lock -》说明有SQL 正在对表做DDL 操作，因此引起了阻塞.

经过检查对用户正在对该表 data_collect_detl 做add index 操作，kill 掉 add index 这个session ,这个等待就消失了。

| 84119 | db | 10.200.210.237:51793 | cbrc | Query | 2704 | Waiting for table metadata lock | alter table `cbrc`.`data_collect_detl` add index `index_data_indc_id` (`data_indc_id`)

处理方法
KILL 84119;

###感谢 DBA学习记录 111
mysql 批量kill session

DBA学习记录 2018-01-14 10:53:46 1368 收藏
展开

root@localhost > select concat('KILL ',id,';') from information_schema.processlist where user=’sam' into outfile '/tmp/a.txt

脚本内容如下：

4.执行上面生成的KILL脚本

root@localhost > source /tmp/a.txt
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
……

https://blog.csdn.net/hellosunqi/article/details/79055383

##################

######## 222

The Slow Query Log

直接查看mysql-slow.log
# Query_time: 2.302229 Lock_time: 0.000346 Rows_sent: 1 Rows_examined: 1916238
SET timestamp=1588843823;
select count(1) as total from (select * from (select a.*,CONCAT(@rowno:=@rowno+1,'') as rownum_ from (

如果启用了慢查询日志并将FILE其选择为输出目标，则写入日志的每个语句都以一行开头，该行以#字符开头并具有以下字段（所有字段都位于一行）：

Query_time: duration

语句执行时间，以秒为单位。

Lock_time: duration

获取锁的时间（以秒为单位）。

Rows_sent: N

发送给客户端的行数。

Rows_examined:

服务器层检查的行数（不计算存储引擎内部的任何处理）。

写入慢查询日志文件的每个语句之前都有一个SET 包含时间戳的语句。从MySQL 8.0.14开始，时间戳指示慢速语句何时开始执行。在8.0.14之前，时间戳指示慢速语句的记录时间（在该语句完成执行之后发生）。

##感谢https://www.percona.com/doc/percona-toolkit/LATEST/pt-query-digest.html
##感谢 https://dev.mysql.com/doc/refman/8.0/en/slow-query-log.html
pt-query-digest inspects the columns in the table. The table must have at least the following columns:

CREATE TABLE query_review_history (
checksum CHAR(32) NOT NULL,
sample TEXT NOT NULL
);
Any columns not mentioned above are inspected to see if they follow a certain naming convention. The column is special if the name ends with an underscore followed by any of these values:

pct|avg|cnt|sum|min|max|pct_95|stddev|median|rank
If the column ends with one of those values, then the prefix is interpreted as the event attribute to store in that column, and the suffix is interpreted as the metric to be stored. For example, a column named Query_time_min will be used to store the minimum Query_time for the class of events.

Next is the table of metrics about this class of queries.

# pct total min max avg 95% stddev median
# Count 0 2
# Exec time 13 1105s 552s 554s 553s 554s 2s 553s
# Lock time 0 216us 99us 117us 108us 117us 12us 108us
# Rows sent 20 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
# Rows exam 0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
The first line is column headers for the table. The percentage is the percent of the total for the whole analysis run, and the total is the actual value of the specified metric. For example, in this case we can see that the query executed 2 times, which is 13% of the total number of queries in the file. The min, max and avg columns are self-explanatory. The 95% column shows the 95th percentile; 95% of the values are less than or equal to this value. The standard deviation shows you how tightly grouped the values are. The standard deviation and median are both calculated from the 95th percentile, discarding the extremely large values.

The stddev, median and 95th percentile statistics are approximate. Exact statistics require keeping every value seen, sorting, and doing some calculations on them. This uses a lot of memory. To avoid this, we keep 1000 buckets, each of them 5% bigger than the one before, ranging from .000001 up to a very big number. When we see a value we increment the bucket into which it falls. Thus we have fixed memory per class of queries. The drawback is the imprecision, which typically falls in the 5 percent range.

最后，如果要在日志文件中查找查询示例，则可以在其中查找字节偏移量。（由于慢日志格式中的某些异常，这并不总是准确的，但通常是正确的。）该位置是指最差的样本，我们将在下文中详细介绍。

接下来是有关此类查询的指标表。

＃pct 总计最小最大平均95％stddev中位数＃计数0 2
＃执行时间13 1105s 552s 554s 553s 554s 2s 553s
＃锁定时间0 216us 99us 117us 108us 117us 12us 108us
＃发送的行20 6.26M
3.13M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M＃行考试0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
第一行是表的列标题。百分比是整个分析运行中总计的百分比，总计是指定指标的实际值。例如，在这种情况下，我们可以看到查询执行了2次，占文件查询总数的13％。最小值，最大值和平均值列是不言自明的。95％列显示第95个百分点；95％的值小于或等于此值。标准偏差显示值的紧密程度。标准偏差和中位数都是从第95个百分位数计算得出的，丢弃了非常大的值。

stddev，中位数和第95个百分位数的统计数据均为近似值。确切的统计信息要求保留每个可见的值，对其进行排序并进行一些计算。这会占用大量内存。为避免这种情况，我们保留了1000个存储桶，每个存储桶都比以前的存储桶大5％，范围从.000001到很大的数字。当我们看到一个值时，便增加它所属的存储桶。因此，每类查询都有固定的内存。缺点是不精确度，通常不超过5％。

##sample
6AE2E64878779CE70A1744847528E482 select distinct t.result_id,t.*,? a inspection db 2020-05-07 14:01:56 52 208.032 2.02074 7.04318 0.006151 0.000091 0.000176

query_time_sum=208.032 seconds
query_count=52
query_avg=4
query_time_min=2.02074 seconds
query_time_max =7.04318 seconds

##########

来源：oschina

链接：https://my.oschina.net/u/4286379/blog/4270257

标签

mysql

Buckets

Percona Toolkit

hostdb