It\'s the same as this one except that I\'m running execl(\"/bin/ls\", \"ls\", NULL);
.
The result is obviously wrong as every syscall returns with
At a punt I'd say you're examining eax
, or its 64 bit equivalent (presumably rax
) for the return code of a system call. There's an additional slot for saving this register named orig_eax
, used for restarting system calls.
I poked around into this stuff quite a lot but can't for the life of me locate my findings. Here are some related questions:
Poking around again it seems my memory serves correct. You'll find everything you need right here in the kernel source (the main site is down, fortunately torvalds now mirrors linux at github).
The code doesn't account for the notification of the exec
from the child, and so ends up handling syscall entry as syscall exit, and syscall exit as syscall entry. That's why you see "syscall 12 returned
" before "syscall 12 called
", etc. (-38
is ENOSYS
which is put into RAX as a default return value by the kernel's syscall entry code.)
As the ptrace(2) man page states:
PTRACE_TRACEME
Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. [...]
You said that the original code you were running was "the same as this one except that I'm running execl("/bin/ls", "ls", NULL);
". Well, it clearly isn't, because you're working with x86_64 rather than 32-bit and have changed the messages at least.
But, assuming you didn't change too much else, the first time the wait()
wakes up the parent, it's not for syscall entry or exit - the parent hasn't executed ptrace(PTRACE_SYSCALL,...)
yet. Instead, you're seeing this notification that the child has performed an exec
(on x86_64, syscall 59 is execve
).
The code incorrectly interprets that as syscall entry. Then it calls ptrace(PTRACE_SYSCALL,...)
, and the next time the parent is woken it is for a syscall entry (syscall 12), but the code reports it as syscall exit.
Note that in this original case, you never see the execve
syscall entry/exit - only the additional notification - because the parent does not execute ptrace(PTRACE_SYSCALL,...)
until after it happens.
If you do arrange the code so that the execve
syscall entry/exit are caught, you will see the new behaviour that you observe. The parent will be woken three times: once for execve
syscall entry (due to use of ptrace(PTRACE_SYSCALL,...)
, once for execve
syscall exit (also due to use of ptrace(PTRACE_SYSCALL,...)
, and a third time for the exec
notification (which happens anyway).
Here is a complete example (for x86 or x86_64) which takes care to show the behaviour of the exec
itself by stopping the child first:
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>
#ifdef __x86_64__
#define SC_NUMBER (8 * ORIG_RAX)
#define SC_RETCODE (8 * RAX)
#else
#define SC_NUMBER (4 * ORIG_EAX)
#define SC_RETCODE (4 * EAX)
#endif
static void child(void)
{
/* Request tracing by parent: */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
/* Stop before doing anything, giving parent a chance to catch the exec: */
kill(getpid(), SIGSTOP);
/* Now exec: */
execl("/bin/ls", "ls", NULL);
}
static void parent(pid_t child_pid)
{
int status;
long sc_number, sc_retcode;
while (1)
{
/* Wait for child status to change: */
wait(&status);
if (WIFEXITED(status)) {
printf("Child exit with status %d\n", WEXITSTATUS(status));
exit(0);
}
if (WIFSIGNALED(status)) {
printf("Child exit due to signal %d\n", WTERMSIG(status));
exit(0);
}
if (!WIFSTOPPED(status)) {
printf("wait() returned unhandled status 0x%x\n", status);
exit(0);
}
if (WSTOPSIG(status) == SIGTRAP) {
/* Note that there are *three* reasons why the child might stop
* with SIGTRAP:
* 1) syscall entry
* 2) syscall exit
* 3) child calls exec
*/
sc_number = ptrace(PTRACE_PEEKUSER, child_pid, SC_NUMBER, NULL);
sc_retcode = ptrace(PTRACE_PEEKUSER, child_pid, SC_RETCODE, NULL);
printf("SIGTRAP: syscall %ld, rc = %ld\n", sc_number, sc_retcode);
} else {
printf("Child stopped due to signal %d\n", WSTOPSIG(status));
}
fflush(stdout);
/* Resume child, requesting that it stops again on syscall enter/exit
* (in addition to any other reason why it might stop):
*/
ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
}
}
int main(void)
{
pid_t pid = fork();
if (pid == 0)
child();
else
parent(pid);
return 0;
}
which gives something like this (this is for 64-bit - system call numbers are different for 32-bit; in particular execve
is 11, rather than 59):
Child stopped due to signal 19 SIGTRAP: syscall 59, rc = -38 SIGTRAP: syscall 59, rc = 0 SIGTRAP: syscall 59, rc = 0 SIGTRAP: syscall 63, rc = -38 SIGTRAP: syscall 63, rc = 0 SIGTRAP: syscall 12, rc = -38 SIGTRAP: syscall 12, rc = 5324800 ...
Signal 19 is the explicit SIGSTOP
; the child stops three times for the execve
as just described above; then twice (entry and exit) for other system calls.
If you're really interesting in all the gory details of ptrace()
, the best documentation I'm aware of is the
README-linux-ptrace file in the strace source. As it says, the "API is complex and has subtle quirks"....
You can print a human-readable description of the last system error with perror or strerror. This error description will help you substantially more.