Preface
We have a lot of methods to diagnose problems in our system such as strace,pstack,gstack,gdb,pt-pmp,etc.But sometimes there will be some fitful performance issues which are not so easy to trace.Thus,pt-stalk may help us in diagnosing these kind of problems.
Introduce
pt-stalk is a tool to collect detail diagnostic data base on triggers you specified such as gdb,oprofile,strace,tcpdump.The trigger is not the conception of trigger in database.They're different at all.pt-stalk provides various options to collect comprehensive data you need.It's really useful and helpful in performance diagnosing.Let's see some details of it.
Procedure
Usage
1 pt-stalk [OPTIONS]
Common Parameters
1 --collect //Collect diagostic data based on spedified triggers.
2 --collect-gdb //Collect gdb data.It prints stack traces from all threads.
3 --collect-oprofile //Collect oprofile data.
4 --collect-strace //Collect strace data.Do not specify it when "--collect-gdb" has been used.
5 --collect-tcpdump //Collect tcpdump data.
6 --cycles //Limite the times when triggering condition occurs(default "5").
7 --dest //Specify the position to store diagnostic data(default "/var/lib/pt-stalk").
8 --disk-bytes-free //Prevent the lack of least disk space.It does not collect data until it has the necessary disk space(default "100M",valid surffixes of unit is k,M,G and T).
9 --disk-pct-free //It's similar with "--disk-bytes-free" but specify the percentage of disk space.
10 --function //Specify the contents to watch for the triggers(default "status",other value is "processlist" or <yourfilename>).
11 --iterations //Limit the times to collect.It will run forever if not give a specific value.
12 --log //Specify the postion to record logs(default "/var/log/pt-stalk.log").It only generates when deamonized.
13 --match //The pattern to specify when using "--function processlist".
14 --mysql-only //Merely collect the MySQL relevant diagnostic data but disk space is the exception.
15 --retention-time //Specify the purge days of diagnostic data(default "30").
16 --run-time //Specify how many seconds to collect diagnostic data(default "30").It should not be longer than the value of "--sleep".
17 --sleep //Specify how many seconds to sleep after last collection.It used to prevent too many operations of collection(default "300").
18 --stalk //Watch and wait for trigger occur(default "yes").
19 --no_stalk //The option can be specify if you want to collect diagnostic data immediately without waiting for trigger to occur.
20 --threshold //Specify the maximum value for collecting(default "25").
21 --variable //Specify the variables to compared against(default "Threads_running").
Examples
Generate benchmark with sysbench.
1 [root@zlm2 08:07:54 ~/sysbench-1.0/src/lua]
2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb cleanup
3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
4
5 Dropping table 'sbtest1'...
6 Dropping table 'sbtest2'...
7 Dropping table 'sbtest3'...
8 Dropping table 'sbtest4'...
9 Dropping table 'sbtest5'...
10 Dropping table 'sbtest6'...
11 Dropping table 'sbtest7'...
12 Dropping table 'sbtest8'...
13 Dropping table 'sbtest9'...
14 Dropping table 'sbtest10'...
15
16 [root@zlm2 08:08:08 ~/sysbench-1.0/src/lua]
17 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb prepare
18 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
19
20 Creating table 'sbtest1'...
21 Inserting 100000 records into 'sbtest1'
22 Creating a secondary index on 'sbtest1'...
23 Creating table 'sbtest2'...
24 Inserting 100000 records into 'sbtest2'
25 Creating a secondary index on 'sbtest2'...
26 Creating table 'sbtest3'...
27 Inserting 100000 records into 'sbtest3'
28 Creating a secondary index on 'sbtest3'...
29 Creating table 'sbtest4'...
30 Inserting 100000 records into 'sbtest4'
31 Creating a secondary index on 'sbtest4'...
32 Creating table 'sbtest5'...
33 Inserting 100000 records into 'sbtest5'
34 Creating a secondary index on 'sbtest5'...
35 Creating table 'sbtest6'...
36 Inserting 100000 records into 'sbtest6'
37 Creating a secondary index on 'sbtest6'...
38 Creating table 'sbtest7'...
39 Inserting 100000 records into 'sbtest7'
40 Creating a secondary index on 'sbtest7'...
41 Creating table 'sbtest8'...
42 Inserting 100000 records into 'sbtest8'
43 Creating a secondary index on 'sbtest8'...
44 Creating table 'sbtest9'...
45 Inserting 100000 records into 'sbtest9'
46 Creating a secondary index on 'sbtest9'...
47 Creating table 'sbtest10'...
48 Inserting 100000 records into 'sbtest10'
49 Creating a secondary index on 'sbtest10'...
50
51 [root@zlm2 08:09:23 ~/sysbench-1.0/src/lua]
52 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --threads=3 --time=3600 --report-interval=10 --rand-type=uniform run
53 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
54
55 Running the test with following options:
56 Number of threads: 3
57 Report intermediate results every 10 second(s)
58 Initializing random number generator from current time
59
60
61 Initializing worker threads...
62
63 Threads started!
64
65 [ 10s ] thds: 3 tps: 258.68 qps: 5176.49 (r/w/o: 3624.11/1034.82/517.56) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
66 [ 20s ] thds: 3 tps: 286.25 qps: 5726.15 (r/w/o: 4008.67/1144.89/572.60) lat (ms,95%): 15.00 err/s: 0.00 reconn/s: 0.00
67 [ 30s ] thds: 3 tps: 270.82 qps: 5416.13 (r/w/o: 3790.80/1083.69/541.64) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
68 [ 40s ] thds: 3 tps: 280.98 qps: 5619.75 (r/w/o: 3934.26/1123.53/561.97) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
69 [ 50s ] thds: 3 tps: 298.42 qps: 5968.38 (r/w/o: 4177.83/1193.70/596.85) lat (ms,95%): 14.46 err/s: 0.00 reconn/s: 0.00
70 [ 60s ] thds: 3 tps: 278.88 qps: 5578.20 (r/w/o: 3904.92/1115.52/557.76) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
71 [ 70s ] thds: 3 tps: 280.91 qps: 5617.66 (r/w/o: 3932.21/1123.63/561.82) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
72 [ 80s ] thds: 3 tps: 281.68 qps: 5632.83 (r/w/o: 3942.77/1126.71/563.35) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
73 [ 90s ] thds: 3 tps: 281.60 qps: 5631.55 (r/w/o: 3942.07/1126.39/563.10) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
74 [ 100s ] thds: 3 tps: 287.62 qps: 5753.86 (r/w/o: 4028.02/1150.49/575.35) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00
75 [ 110s ] thds: 3 tps: 308.99 qps: 6180.45 (r/w/o: 4326.12/1236.35/617.97) lat (ms,95%): 13.95 err/s: 0.00 reconn/s: 0.00
76 ... //Omitted.
Collect diagnostic data using pt-stalk.
1 [root@zlm2 08:14:36 /data/mysql/mysql3306]
2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connect --threshold 2
3 mysql: [Warning] Using a password on the command line interface can be insecure.
4 2018_07_09_08_15_46 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connect --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
5 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
6 2018_07_09_08_15_46 Detected value is empty; something failed? Trigger exit status: 0
7 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
8 2018_07_09_08_15_47 Detected value is empty; something failed? Trigger exit status: 0
9 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
10 2018_07_09_08_15_48 Detected value is empty; something failed? Trigger exit status: 0
11 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
12 2018_07_09_08_15_50 Detected value is empty; something failed? Trigger exit status: 0
13 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
14 2018_07_09_08_15_51 Detected value is empty; something failed? Trigger exit status: 0
15 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
16 2018_07_09_08_15_52 Detected value is empty; something failed? Trigger exit status: 0
17 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
18 2018_07_09_08_15_53 Detected value is empty; something failed? Trigger exit status: 0
19 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
20 2018_07_09_08_15_54 Detected value is empty; something failed? Trigger exit status: 0
21 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
22 2018_07_09_08_15_55 Detected value is empty; something failed? Trigger exit status: 0
23 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
24 2018_07_09_08_15_56 Detected value is empty; something failed? Trigger exit status: 0
25 ^C2018_07_09_08_15_57 Caught signal, exiting
26 2018_07_09_08_15_57 All subprocesses have finished
27 2018_07_09_08_15_57 Exiting because OKTORUN is false
28 2018_07_09_08_15_57 /usr/bin/pt-stalk exit status 1
29
30 //The value of "--variable" should be "Threads_connected".
31 //Modify the correct value and run it again.
32
33 [root@zlm2 08:15:57 /data/mysql/mysql3306]
34 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connected --threshold 2
35 mysql: [Warning] Using a password on the command line interface can be insecure.
36 2018_07_09_08_19_39 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
37 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
38 2018_07_09_08_19_39 Check results: status(Threads_connected)=5, matched=yes, cycles_true=1
39 2018_07_09_08_19_39 Collect 1 triggered
40 2018_07_09_08_19_39 Collect 1 PID 5921
41 2018_07_09_08_19_39 Collect 1 done
42 2018_07_09_08_19_39 Sleeping 300 seconds after collect
43 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
44 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1
45 2018_07_09_08_24_39 Collect 2 triggered
46 2018_07_09_08_24_39 Collect 2 PID 7398
47 2018_07_09_08_24_39 Collect 2 done
48 2018_07_09_08_24_40 Sleeping 300 seconds after collect
49 ^C2018_07_09_08_25_56 Caught signal, exiting //Terminate to collect diagnostic data by "Ctrl+C"
50 2018_07_09_08_25_56 Waiting up to 90 seconds for subprocesses to finish...
51 2018_07_09_08_25_56 Exiting because OKTORUN is false
52 2018_07_09_08_25_56 /usr/bin/pt-stalk exit status 1
Check the value of "show global status ... ".
1 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
2 +------------------------+-------+
3 | Variable_name | Value |
4 +------------------------+-------+
5 | Delayed_insert_threads | 0 |
6 | Slow_launch_threads | 0 |
7 | Threads_cached | 1 |
8 | Threads_connected | 5 | //Threads_connected has been up to 5.
9 | Threads_created | 5 |
10 | Threads_running | 1 |
11 +------------------------+-------+
12 6 rows in set (0.04 sec)
13
14
15 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
16 +------------------------+-------+
17 | Variable_name | Value |
18 +------------------------+-------+
19 | Delayed_insert_threads | 0 |
20 | Slow_launch_threads | 0 |
21 | Threads_cached | 3 |
22 | Threads_connected | 6 | //Second time,Threads_connected has been up to 6.
23 | Threads_created | 9 |
24 | Threads_running | 1 |
25 +------------------------+-------+
26 6 rows in set (0.01 sec)
Check the output files in default directory("/var/lib/pt-stalk").
1 [root@zlm2 08:31:41 /var/lib/pt-stalk]
2 #ls -lrt
3 total 3928
4 -rw-r--r-- 1 root root 383 Jul 9 08:19 2018_07_09_08_19_39-trigger
5 -rw-r--r-- 1 root root 9845 Jul 9 08:19 2018_07_09_08_19_39-pmap
6 -rw-r--r-- 1 root root 16554 Jul 9 08:19 2018_07_09_08_19_39-variables
7 -rw-r--r-- 1 root root 4352 Jul 9 08:19 2018_07_09_08_19_39-innodbstatus1
8 -rw-r--r-- 1 root root 384 Jul 9 08:19 2018_07_09_08_19_39-mutex-status1
9 -rw-r--r-- 1 root root 8507 Jul 9 08:19 2018_07_09_08_19_39-ps
10 -rw-r--r-- 1 root root 10356 Jul 9 08:19 2018_07_09_08_19_39-lsof
11 -rw-r--r-- 1 root root 5280 Jul 9 08:19 2018_07_09_08_19_39-opentables1
12 -rw-r--r-- 1 root root 8204 Jul 9 08:19 2018_07_09_08_19_39-top
13 -rw-r--r-- 1 root root 28343 Jul 9 08:19 2018_07_09_08_19_39-sysctl
14 -rw-r--r-- 1 root root 139 Jul 9 08:20 2018_07_09_08_19_39-disk-space
15 -rw-r--r-- 1 root root 1084769 Jul 9 08:20 2018_07_09_08_19_39-mysqladmin
16 -rw-r--r-- 1 root root 2782 Jul 9 08:20 2018_07_09_08_19_39-vmstat
17 -rw-r--r-- 1 root root 24750 Jul 9 08:20 2018_07_09_08_19_39-procstat
18 -rw-r--r-- 1 root root 14403 Jul 9 08:20 2018_07_09_08_19_39-diskstats
19 -rw-r--r-- 1 root root 72469 Jul 9 08:20 2018_07_09_08_19_39-procvmstat
20 -rw-r--r-- 1 root root 51960 Jul 9 08:20 2018_07_09_08_19_39-netstat_s
21 -rw-r--r-- 1 root root 296850 Jul 9 08:20 2018_07_09_08_19_39-slabinfo
22 -rw-r--r-- 1 root root 33210 Jul 9 08:20 2018_07_09_08_19_39-interrupts
23 -rw-r--r-- 1 root root 48390 Jul 9 08:20 2018_07_09_08_19_39-netstat
24 -rw-r--r-- 1 root root 37260 Jul 9 08:20 2018_07_09_08_19_39-meminfo
25 -rw-r--r-- 1 root root 17100 Jul 9 08:20 2018_07_09_08_19_39-df
26 -rw-r--r-- 1 root root 40753 Jul 9 08:20 2018_07_09_08_19_39-processlist
27 -rw-r--r-- 1 root root 4555 Jul 9 08:20 2018_07_09_08_19_39-innodbstatus2
28 -rw-r--r-- 1 root root 74476 Jul 9 08:20 2018_07_09_08_19_39-transactions
29 -rw-r--r-- 1 root root 5 Jul 9 08:20 2018_07_09_08_19_39-hostname
30 -rw-r--r-- 1 root root 385 Jul 9 08:20 2018_07_09_08_19_39-mutex-status2
31 -rw-r--r-- 1 root root 5280 Jul 9 08:20 2018_07_09_08_19_39-opentables2
32 -rw-r--r-- 1 root root 325 Jul 9 08:20 2018_07_09_08_19_39-vmstat-overall
33 -rw-r--r-- 1 root root 24763 Jul 9 08:20 2018_07_09_08_19_39-output
34
35 //There're a series of files(starts with "tigger" and ends with "output") each time we collect the diagnostic data.
36
37 -rw-r--r-- 1 root root 383 Jul 9 08:24 2018_07_09_08_24_39-trigger
38 -rw-r--r-- 1 root root 10251 Jul 9 08:24 2018_07_09_08_24_39-pmap
39 -rw-r--r-- 1 root root 16554 Jul 9 08:24 2018_07_09_08_24_39-variables
40 -rw-r--r-- 1 root root 4353 Jul 9 08:24 2018_07_09_08_24_39-innodbstatus1
41 -rw-r--r-- 1 root root 8583 Jul 9 08:24 2018_07_09_08_24_39-ps
42 -rw-r--r-- 1 root root 386 Jul 9 08:24 2018_07_09_08_24_39-mutex-status1
43 -rw-r--r-- 1 root root 28343 Jul 9 08:24 2018_07_09_08_24_39-sysctl
44 -rw-r--r-- 1 root root 10534 Jul 9 08:24 2018_07_09_08_24_39-lsof
45 -rw-r--r-- 1 root root 5280 Jul 9 08:24 2018_07_09_08_24_39-opentables1
46 -rw-r--r-- 1 root root 8053 Jul 9 08:24 2018_07_09_08_24_39-top
47 -rw-r--r-- 1 root root 139 Jul 9 08:25 2018_07_09_08_24_39-disk-space
48 -rw-r--r-- 1 root root 2783 Jul 9 08:25 2018_07_09_08_24_39-vmstat
49 -rw-r--r-- 1 root root 1084769 Jul 9 08:25 2018_07_09_08_24_39-mysqladmin
50 -rw-r--r-- 1 root root 24899 Jul 9 08:25 2018_07_09_08_24_39-procstat
51 -rw-r--r-- 1 root root 48390 Jul 9 08:25 2018_07_09_08_24_39-netstat
52 -rw-r--r-- 1 root root 296850 Jul 9 08:25 2018_07_09_08_24_39-slabinfo
53 -rw-r--r-- 1 root root 52230 Jul 9 08:25 2018_07_09_08_24_39-netstat_s
54 -rw-r--r-- 1 root root 33210 Jul 9 08:25 2018_07_09_08_24_39-interrupts
55 -rw-r--r-- 1 root root 37260 Jul 9 08:25 2018_07_09_08_24_39-meminfo
56 -rw-r--r-- 1 root root 14460 Jul 9 08:25 2018_07_09_08_24_39-diskstats
57 -rw-r--r-- 1 root root 17100 Jul 9 08:25 2018_07_09_08_24_39-df
58 -rw-r--r-- 1 root root 72914 Jul 9 08:25 2018_07_09_08_24_39-procvmstat
59 -rw-r--r-- 1 root root 44370 Jul 9 08:25 2018_07_09_08_24_39-processlist
60 -rw-r--r-- 1 root root 4445 Jul 9 08:25 2018_07_09_08_24_39-innodbstatus2
61 -rw-r--r-- 1 root root 76282 Jul 9 08:25 2018_07_09_08_24_39-transactions
62 -rw-r--r-- 1 root root 5 Jul 9 08:25 2018_07_09_08_24_39-hostname
63 -rw-r--r-- 1 root root 387 Jul 9 08:25 2018_07_09_08_24_39-mutex-status2
64 -rw-r--r-- 1 root root 5280 Jul 9 08:25 2018_07_09_08_24_39-opentables2
65 -rw-r--r-- 1 root root 325 Jul 9 08:25 2018_07_09_08_24_39-vmstat-overall
66 -rw-r--r-- 1 root root 24763 Jul 9 08:25 2018_07_09_08_24_39-output
67
68 [root@zlm2 08:33:37 /var/lib/pt-stalk]
69 #cat 2018_07_09_08_24_39-trigger
70 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1
71 2018_07_09_08_24_39 pt-stalk ran with --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
72
73 //The trigger file records the options we have used.
74
75 [root@zlm2 08:34:17 /var/lib/pt-stalk]
76 #cat 2018_07_09_08_24_39-vmstat-overall
77 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
78 r b swpd free buff cache si so bi bo in cs us sy id wa st
79 18 2 9692 60716 0 583316 0 0 11 139 92 597 4 1 96 0 0
80 4 0 10440 78828 0 569672 0 25 89 1965 1455 12046 80 20 0 0 0
81
82 //The vmstat-overall records the vmstat output.
83 //each file contains the relevant diagnostic data about what their name called.I'm not going to demonstrate all of them.
Take care of you remain disk space,pt-stalk won't run anymore if you're out of space.
1 [root@zlm2 08:25:56 /data/mysql/mysql3306]
2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect --cycles 1 --variable Threads_connected --threshold 2 --sleep 30 --demonize
3 Cannot open /tmp/pt-stalk.8940.FSboRq/po/daemonize: No space left on device at -e line 13, <$fh> chunk 274.
4 No long attribute in option spec /tmp/pt-stalk.8940.FSboRq/po/ask-pass
5
6 [root@zlm2 08:51:27 /data/mysql/mysql3306]
7 #df -h
8 ]Filesystem Size Used Avail Use% Mounted on
9 /dev/mapper/centos-root 8.4G 8.4G 20K 100% / //The root of linux has been out of space.
10 devtmpfs 488M 0 488M 0% /dev
11 tmpfs 497M 0 497M 0% /dev/shm
12 tmpfs 497M 6.6M 491M 2% /run
13 tmpfs 497M 0 497M 0% /sys/fs/cgroup
14 /dev/sda1 497M 118M 379M 24% /boot
15 none 87G 81G 5.9G 94% /vagrant
Summary
- pt-stalk is another tool in diagnosing system performance and similar with pt-pmp but not the same.
- pt-stalk will generate many statistic files which can really help you collecting information in almost every aspect.
- pt-stalk has four main triggers:gdb,oprofile,strace,tcpdump.It's flexible to collect data with diffrent dimensionality.
来源:oschina
链接:https://my.oschina.net/u/4361557/blog/3912596