I\'d like to implement a sandbox by ptrace()
ing a process I start and all its children would create (including grandchildren etc.). The ptrace()
pa
The major problem is that many syscall arguments, like filenames, are passed to the kernel as userspace pointers. Any task that is allowed to run simultaneously and has write access to the memory that the pointer points to can effectively modify these arguments after they are inspected by your supervisor and before the kernel acts on them. By the time the kernel follows the pointer, the pointed-to contents may have been deliberately changed by another schedulable task (process or thread) with access to that memory. For example:
Thread 1 Supervisor Thread 2
-----------------------------------------------------------------------------------------------------
strcpy(filename, "/dev/null");
open(filename, O_RDONLY);
Check filename - OK
strcpy(filename, "/home/user/.ssh/id_rsa");
(in kernel) opens "/home/user/.ssh/id_rsa"
One way to stop this is to disallow calling clone()
with the CLONE_VM
flag, and in addition prevent any creation of writeable MAP_SHARED
memory mappings (or at least keep track of them such that you deny any syscall that tries to directly reference data from such a mapping). You could also copy any such argument into a non-shared bounce-buffer before allowing the syscall to proceed. This will effectively prevent any threaded application from running in the sandbox.
The alternative is to SIGSTOP
every other process in the traced group around every potentially dangerous syscall, wait for them to actually stop, then allow the syscall to proceed. After it returns, you then SIGCONT
them (unless they were already stopped). Needless to say, this may have a significant performance impact.
(There are also analogous problems with syscall arguments that are passed on the stack, and with shared open file tables).
Doesn't ptrace only get notifications after-the-fact? I don't think you have a chance to actually stop the syscall from happening, only to kill it as fast as you can once you see something "evil".
It seems like you're more looking for something like SELinux or AppArmor, where you can guarantee that not even one illegal call gets through.