I\'d like to statistically profile my C code at the instruction level. I need to know how many additions, multiplications, divisions, etc I\'m performing.
This is no
You can use pin-instat which is a PIN tool. It's a bit over kill as it records more information than the instruction count. It still should be more efficient than your gdb approach through.
Disclaimer: I'm the author of pin-instat.
The valgrind tool cachegrind can be used to get execution counts of each line in the compiled assembly (the Ir
value in the first column).
The Linux tool perf
will give you a good deal of profiling information; specifically, perf annotate will give you per-instruction relative counts.
It is possible to drill down to the instruction level withperf annotate
. For that, you need to invokeperf annotate
with the name of the command to annotate. All the functions with samples will be disassembled and each instruction will have its relative percentage of samples reported:perf record ./noploop 5 perf annotate -d ./noploop ------------------------------------------------ Percent | Source code & Disassembly of noploop.noggdb ------------------------------------------------ : : : : Disassembly of section .text: : : 08048484 <main>: 0.00 : 8048484: 55 push %ebp 0.00 : 8048485: 89 e5 mov %esp,%ebp [...] 0.00 : 8048530: eb 0b jmp 804853d <main+0xb9> 15.08 : 8048532: 8b 44 24 2c mov 0x2c(%esp),%eax 0.00 : 8048536: 83 c0 01 add $0x1,%eax 14.52 : 8048539: 89 44 24 2c mov %eax,0x2c(%esp) 14.27 : 804853d: 8b 44 24 2c mov 0x2c(%esp),%eax 56.13 : 8048541: 3d ff e0 f5 05 cmp $0x5f5e0ff,%eax 0.00 : 8048546: 76 ea jbe 8048532 <main+0xae> [...]
QEMU user mode -d in_asm
This is another simple thing you can do to get an instruction trace:
sudo apt-get install qemu-user
qemu-x86_64 -d in_asm main.out
Let's test it with an x86_64 triple hello world:
main.S
.text
.global _start
_start:
asm_main_after_prologue:
mov $3, %rbx
write:
mov $1, %rax /* syscall number */
mov $1, %rdi /* stdout */
mov $msg, %rsi /* buffer */
mov $len, %rdx /* len */
syscall
dec %rbx
jne write
exit:
mov $60, %rax /* syscall number */
mov $0, %rdi /* exit status */
syscall
msg:
.ascii "hello\n"
len = . - msg
adapted from GitHub upstream.
Assemble and run with:
as -o main.o main.S
ld -o main.out main.o
./main.out
Stdout output:
hello
hello
hello
Running it through QEMU outputs the instruction trace to stderr:
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
host mmap_min_addr=0x10000
Reserved 0x1000 bytes of guest address space
Relocating guest address space from 0x0000000000400000 to 0x400000
guest_base 0x0
start end size prot
0000000000400000-0000000000401000 0000000000001000 r-x
0000004000000000-0000004000001000 0000000000001000 ---
0000004000001000-0000004000801000 0000000000800000 rw-
start_brk 0x0000000000000000
end_code 0x00000000004000b8
start_code 0x0000000000400000
start_data 0x00000000004000b8
end_data 0x00000000004000b8
start_stack 0x00000040007fed70
brk 0x00000000004000b8
entry 0x0000000000400078
----------------
IN:
0x0000000000400078: mov $0x3,%rbx
0x000000000040007f: mov $0x1,%rax
0x0000000000400086: mov $0x1,%rdi
0x000000000040008d: mov $0x4000b2,%rsi
0x0000000000400094: mov $0x6,%rdx
0x000000000040009b: syscall
----------------
IN:
0x000000000040009d: dec %rbx
0x00000000004000a0: jne 0x40007f
----------------
IN:
0x000000000040007f: mov $0x1,%rax
0x0000000000400086: mov $0x1,%rdi
0x000000000040008d: mov $0x4000b2,%rsi
0x0000000000400094: mov $0x6,%rdx
0x000000000040009b: syscall
----------------
IN:
0x00000000004000a2: mov $0x3c,%rax
0x00000000004000a9: mov $0x0,%rdi
0x00000000004000b0: syscall
I expect this method to be relatively fast. It works by reading input instructions and producing output instructions that the host can run, much like cachegrind which was mentioned at: https://stackoverflow.com/a/2971979/895245
One cool thing about this is that you can also trivially trace executables of other architectures, see for example aarch64: How does native android code written for ARM run on x86?
This method also displays the current symbol of unstripped executables, e.g. the trace of:
main.c
#include <stdio.h>
int say_hello() {
puts("hello");
}
int main(void) {
say_hello();
}
compile and run:
gcc -ggdb3 -O0 -o main.out main.c
qemu-x86_64 -d in_asm ./main.out
contains:
----------------
IN: main
0x0000000000400537: push %rbp
0x0000000000400538: mov %rsp,%rbp
0x000000000040053b: mov $0x0,%eax
0x0000000000400540: callq 0x400526
----------------
IN: say_hello
0x0000000000400526: push %rbp
0x0000000000400527: mov %rsp,%rbp
0x000000000040052a: mov $0x4005d4,%edi
0x000000000040052f: callq 0x400400
----------------
IN:
0x0000000000400400: jmpq *0x200c12(%rip) # 0x601018
It does not however show symbols within shared libraries such as puts.
But you can see them if you compile with -static
:
----------------
IN: main
0x00000000004009bf: push %rbp
0x00000000004009c0: mov %rsp,%rbp
0x00000000004009c3: mov $0x0,%eax
0x00000000004009c8: callq 0x4009ae
----------------
IN: say_hello
0x00000000004009ae: push %rbp
0x00000000004009af: mov %rsp,%rbp
0x00000000004009b2: mov $0x4a1064,%edi
0x00000000004009b7: callq 0x40faa0
----------------
IN: puts
0x000000000040faa0: push %r12
0x000000000040faa2: push %rbp
0x000000000040faa3: mov %rdi,%r12
0x000000000040faa6: push %rbx
0x000000000040faa7: callq 0x423830
Related: https://unix.stackexchange.com/questions/147343/how-to-determine-what-instructions-a-process-is-executing
Tested in Ubuntu 16.04, QEMU 2.5.0.
I eventually used a trivial yet effective solution.
display/i $pc
Configured a simple gdb script that breaks in the function I need to analyze and proceeds to step instruction by instruction:
set $i=0
break main
run
while ($i<100000)
si
set $i = $i + 1
end
quit
Executed gdb with my script dumping output into a log file:
gdb -x script a.out > log.txt
Analyzed the log to count specific instruction calls.
Crude, but it works...