问题
I wrote the following piece of code that steps through /bin/ls and counts its instructions:
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/syscall.h>
int main()
{
pid_t child;
child = fork(); //create child
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
char* child_argv[] = {"/bin/ls", NULL};
execv("/bin/ls", child_argv);
}
else {
int status;
long long ins_count = 0;
while(1)
{
//stop tracing if child terminated successfully
wait(&status);
if(WIFEXITED(status))
break;
ins_count++;
ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
}
printf("\n%lld Instructions executed.\n", ins_count);
}
return 0;
}
Running this code gives me about 500.000 instructions executed. As far as I know, most of these instructions should be from the dynamic linker. When I trace /bin/ls with QEMU with qemu-x86_64 -singlestep -D log -d in_asm /bin/ls , I get about 17.000 instructions executed. What do I have to adjust to start and stop counting at the same points QEMU does? (aka. counting the same instructions).
I traced a "return null" program with QEMU and it resulted in 7840 instructions while my code gave me 109025, therefore QEMU seems to trace more than the main but less than my code.
My goal is to later compare these instructions, that is why I want to iterate through the same ones like QEMU.
回答1:
QEMU's "in_asm" logging is not a log of executed instructions. It logs every time an instruction is translated (ie when QEMU generates a bit of host code corresponding to it). That translation is then cached and if the guest loops around and executes the same instruction again QEMU will simply re-use the same translation, and so it won't be logged by in_asm. "in_asm reports many fewer instructions" is therefore expected.
Logging every executed instruction via the -d options is a bit tricky -- you need to look at the 'cpu' and 'exec' traces, to use the 'nochain' suboption of -d to disable a QEMU optimisation that would otherwise result in some blocks not being logged, to use '-singlestep' to force one instruction per block, and also to account for a few corner cases where we print an execution trace and then don't actually execute the instruction. This is because the -d option is not intended as a way for users to introspect the behaviour of their program -- it is a debug option intended to allow debugging of what QEMU and the guest program are doing together, and so it prints information that requires a little understanding of QEMU internals to interpret correctly.
You might find it simpler to write a QEMU "plugin" instead: https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html -- this is an API designed to be fairly straightforward to write instrumentation like "count instructions executed". If you're lucky then one of the sample plugins might even be sufficient for your purposes.
回答2:
I modified your program to make it run on a dedicated CPU core (e.g. number 7) adding the following code before the fork():
#define _GNU_SOURCE
#include <sched.h>
[...]
cpu_set_t set;
int rc;
CPU_ZERO(&set);
CPU_SET(7, &set);
// Migrate the calling process on the target cpu
rc = sched_setaffinity(0, sizeof(cpu_set_t), &set);
if (0 != rc) {
fprintf(stderr, "sched_setaffinity(): '%m' (%d)\n", errno);
return -1;
}
// Dummy system call to trigger the migration. Actually, the on line
// manual says that the previous call will make the current process
// migrate but I saw in cpuid's source code that the guy calls sleep(0)
// to make sure that the migration will be done. In my opinion, it may
// be safer to call sched_yield()
rc = sched_yield();
if (0 != rc) {
fprintf(stderr, "sched_yield(): '%m' (%d)\n", errno);
return -1;
}
// Create child
child = fork();
[...]
My PC is running Ubuntu/Linux 5.4.0 on:
# Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
# Code name : Ivy Bridge
# cpu family : 6
# model : 58
# microcode : 0x21
# Number of physical cores: 4
# Number of harware threads: 8
# Base frequency: 3,50 GHz
# Turbo frequency: 3,90 GHz
# cpu MHz: 1604.615
# cache size : 8192 KB
# cache_alignment: 64
# Address sizes: 36 bits physical, 48 bits virtual
#
# PMU version: 3
# Maximum number of fixed counters: 3
# Fixed counter bit width: 48
# Maximum number of programmable counters: 4
# Programmable counter bit width: 48
If I launch the modified program with ptrace() activated, I get almost the same numbers as yours:
$ test/progexec
[...]
548765 Instructions executed.
I designed a tool which reads the Intel PMU counters. The fixed counter#0 is:
# INST_RETIRED.ANY
#
# Number of instructions that retire execution. For instructions that consist of multiple
# uops, this event counts the retirement of the last uop of the instruction. The counter
# continues counting during hardware interrupts, traps, and in-side interrupt handlers.
#
The reading of the above counter on the CPU core#7 where the program runs gives the following results:
- 1871879 Instructions for User + Kernel space execution (rings 0-3)
- 546874 Instructions for User space execution (ring 3)
- 1324451 Instructions for Kernel space execution (ring 0)
So, according to the above numbers, the program with ptrace(PTRACE_SINGLESTEP) computes the number of instructions when the program is running in user space (Intel protection ring#3).
N.B.: Linux uses ring 0 for kernel space and ring 3 for user space.
来源:https://stackoverflow.com/questions/64847254/what-instructions-does-qemu-trace