Tracing/profiling instructions

前端 未结 5 1823
渐次进展
渐次进展 2020-12-28 20:46

I\'d like to statistically profile my C code at the instruction level. I need to know how many additions, multiplications, divisions, etc I\'m performing.

This is no

相关标签:
5条回答
  • 2020-12-28 21:01

    You can use pin-instat which is a PIN tool. It's a bit over kill as it records more information than the instruction count. It still should be more efficient than your gdb approach through.

    Disclaimer: I'm the author of pin-instat.

    0 讨论(0)
  • 2020-12-28 21:07

    The valgrind tool cachegrind can be used to get execution counts of each line in the compiled assembly (the Ir value in the first column).

    0 讨论(0)
  • 2020-12-28 21:14

    The Linux tool perf will give you a good deal of profiling information; specifically, perf annotate will give you per-instruction relative counts.

    It is possible to drill down to the instruction level with perf annotate. For that, you need to invoke perf annotate with the name of the command to annotate. All the functions with samples will be disassembled and each instruction will have its relative percentage of samples reported:
    perf record ./noploop 5
    perf annotate -d ./noploop
    
    ------------------------------------------------
     Percent |   Source code & Disassembly of noploop.noggdb
    ------------------------------------------------
             :
             :
             :
             :   Disassembly of section .text:
             :
             :   08048484 <main>:
        0.00 :    8048484:       55                      push   %ebp
        0.00 :    8048485:       89 e5                   mov    %esp,%ebp [...]
        0.00 :    8048530:       eb 0b                   jmp    804853d <main+0xb9>
       15.08 :    8048532:       8b 44 24 2c             mov    0x2c(%esp),%eax
        0.00 :    8048536:       83 c0 01                add    $0x1,%eax
       14.52 :    8048539:       89 44 24 2c             mov    %eax,0x2c(%esp)
       14.27 :    804853d:       8b 44 24 2c             mov    0x2c(%esp),%eax
       56.13 :    8048541:       3d ff e0 f5 05          cmp    $0x5f5e0ff,%eax
        0.00 :    8048546:       76 ea                   jbe    8048532 <main+0xae> [...]
    
    0 讨论(0)
  • 2020-12-28 21:15

    QEMU user mode -d in_asm

    This is another simple thing you can do to get an instruction trace:

    sudo apt-get install qemu-user
    qemu-x86_64 -d in_asm main.out
    

    Let's test it with an x86_64 triple hello world:

    main.S

    .text
    .global _start
    _start:
    asm_main_after_prologue:
        mov $3, %rbx
    write:
        mov $1, %rax    /* syscall number */
        mov $1, %rdi    /* stdout */
        mov $msg, %rsi  /* buffer */
        mov $len, %rdx  /* len */
        syscall
        dec %rbx
        jne write
    exit:
        mov $60, %rax   /* syscall number */
        mov $0, %rdi    /* exit status */
        syscall
    msg:
        .ascii "hello\n"
    len = . - msg
    

    adapted from GitHub upstream.

    Assemble and run with:

    as -o main.o main.S 
    ld -o main.out main.o
    ./main.out
    

    Stdout output:

    hello
    hello
    hello
    

    Running it through QEMU outputs the instruction trace to stderr:

    warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
    host mmap_min_addr=0x10000
    Reserved 0x1000 bytes of guest address space
    Relocating guest address space from 0x0000000000400000 to 0x400000
    guest_base  0x0
    start            end              size             prot
    0000000000400000-0000000000401000 0000000000001000 r-x
    0000004000000000-0000004000001000 0000000000001000 ---
    0000004000001000-0000004000801000 0000000000800000 rw-
    start_brk   0x0000000000000000
    end_code    0x00000000004000b8
    start_code  0x0000000000400000
    start_data  0x00000000004000b8
    end_data    0x00000000004000b8
    start_stack 0x00000040007fed70
    brk         0x00000000004000b8
    entry       0x0000000000400078
    ----------------
    IN: 
    0x0000000000400078:  mov    $0x3,%rbx
    0x000000000040007f:  mov    $0x1,%rax
    0x0000000000400086:  mov    $0x1,%rdi
    0x000000000040008d:  mov    $0x4000b2,%rsi
    0x0000000000400094:  mov    $0x6,%rdx
    0x000000000040009b:  syscall 
    
    ----------------
    IN: 
    0x000000000040009d:  dec    %rbx
    0x00000000004000a0:  jne    0x40007f
    
    ----------------
    IN: 
    0x000000000040007f:  mov    $0x1,%rax
    0x0000000000400086:  mov    $0x1,%rdi
    0x000000000040008d:  mov    $0x4000b2,%rsi
    0x0000000000400094:  mov    $0x6,%rdx
    0x000000000040009b:  syscall 
    
    ----------------
    IN: 
    0x00000000004000a2:  mov    $0x3c,%rax
    0x00000000004000a9:  mov    $0x0,%rdi
    0x00000000004000b0:  syscall 
    

    I expect this method to be relatively fast. It works by reading input instructions and producing output instructions that the host can run, much like cachegrind which was mentioned at: https://stackoverflow.com/a/2971979/895245

    One cool thing about this is that you can also trivially trace executables of other architectures, see for example aarch64: How does native android code written for ARM run on x86?

    This method also displays the current symbol of unstripped executables, e.g. the trace of:

    main.c

    #include <stdio.h>
    
    int say_hello() {
        puts("hello");
    }
    
    int main(void) {
        say_hello();
    }
    

    compile and run:

    gcc -ggdb3 -O0 -o main.out main.c
    qemu-x86_64 -d in_asm ./main.out
    

    contains:

    ----------------
    IN: main
    0x0000000000400537:  push   %rbp
    0x0000000000400538:  mov    %rsp,%rbp
    0x000000000040053b:  mov    $0x0,%eax
    0x0000000000400540:  callq  0x400526
    
    ----------------
    IN: say_hello
    0x0000000000400526:  push   %rbp
    0x0000000000400527:  mov    %rsp,%rbp
    0x000000000040052a:  mov    $0x4005d4,%edi
    0x000000000040052f:  callq  0x400400
    
    ----------------
    IN: 
    0x0000000000400400:  jmpq   *0x200c12(%rip)        # 0x601018
    

    It does not however show symbols within shared libraries such as puts.

    But you can see them if you compile with -static:

    ----------------
    IN: main
    0x00000000004009bf:  push   %rbp
    0x00000000004009c0:  mov    %rsp,%rbp
    0x00000000004009c3:  mov    $0x0,%eax
    0x00000000004009c8:  callq  0x4009ae
    
    ----------------
    IN: say_hello
    0x00000000004009ae:  push   %rbp
    0x00000000004009af:  mov    %rsp,%rbp
    0x00000000004009b2:  mov    $0x4a1064,%edi
    0x00000000004009b7:  callq  0x40faa0
    
    ----------------
    IN: puts
    0x000000000040faa0:  push   %r12
    0x000000000040faa2:  push   %rbp
    0x000000000040faa3:  mov    %rdi,%r12
    0x000000000040faa6:  push   %rbx
    0x000000000040faa7:  callq  0x423830
    

    Related: https://unix.stackexchange.com/questions/147343/how-to-determine-what-instructions-a-process-is-executing

    Tested in Ubuntu 16.04, QEMU 2.5.0.

    0 讨论(0)
  • 2020-12-28 21:21

    I eventually used a trivial yet effective solution.

    1. Configured GDB to display the disassembly of the next instruction (every time it stops) by invoking:

    display/i $pc

    1. Configured a simple gdb script that breaks in the function I need to analyze and proceeds to step instruction by instruction:

      set $i=0
      break main
      run
      while ($i<100000)
      si
      set $i = $i + 1
      end
      quit
      

    2. Executed gdb with my script dumping output into a log file:

      gdb -x script a.out > log.txt

    3. Analyzed the log to count specific instruction calls.

    Crude, but it works...

    0 讨论(0)
提交回复
热议问题