Disclaimer
I am well aware that PHP might not have been the best choice in this case for a socket server. Please refrain from suggesting differen
I promise there is a solution at the end :P
Alright... so here we are, 10 days later and I believe that I have solved this issue. I didn't want to add onto an already longish post so I'll include in this answer some of the things that I tried.
Taking @sym's advice, and reading more into the documentation and the comments on the documentation, the pcntl_waitpid() description states :
If a child as requested by pid has already exited by the time of the call (a so-called
"zombie" process), the function returns immediately. Any system resources used by the child
are freed...
So I setup my pcntl_signal()
handler like this -
function sig_handler($signo){
global $childProcesses;
$pid = pcntl_waitpid(-1, $status, WNOHANG);
echo "Sound the alarm! ";
if ($pid != 0){
if (posix_kill($pid, 9)){
echo "Child {$pid} has tragically died!".PHP_EOL;
unset($childProcesses[$pid]);
}
}
}
// These define the signal handling
// pcntl_signal(SIGTERM, "sig_handler");
// pcntl_signal(SIGHUP, "sig_handler");
// pcntl_signal(SIGINT, "sig_handler");
pcntl_signal(SIGCHLD, "sig_handler");
For completion, I'll include the actual code I'm using for forking a child process -
function broadcastData($socketArray, $data){
global $db,$childProcesses;
$pid = pcntl_fork();
if($pid == -1) {
// Something went wrong (handle errors here)
// Log error, email the admin, pull emergency stop, etc...
echo "Could not fork()!!";
} elseif($pid == 0) {
// This part is only executed in the child
foreach($socketArray AS $socket) {
// There's more happening here but the essence is this
socket_write($socket,$msg,strlen($msg));
// TODO : Consider additional forking here for each client.
}
// This is where the signal is fired
exit(0);
}
// If the child process did not exit above, then this code would be
// executed by both parent and child. In my case, the child will
// never reach these commands.
$childProcesses[] = $pid;
// The child process is now occupying the same database
// connection as its parent (in my case mysql). We have to
// reinitialize the parent's DB connection in order to continue using it.
$db = dbEngine::factory(_dbEngine);
}
Yea... That's a ratio of 1:1 comments to code :P
So this was looking great and I saw the echo of :
Sound the alarm! Child 12345 has tragically died!
However when the socket server loop did it's next iteration, the socket_select()
function failed throwing this error :
PHP Warning: socket_select(): unable to select [4]: Interrupted system call...
The server would now go into a vegetative state totally oblivious to the world around him, not responding to any requests other than manual kill commands from a root terminal.
I'm not going to get into why this was happening or what I did after that to debug it... lets just say it was a frustrating week...
much coffee, sore eyes and 10 days later...
Drum roll please
Mentioned here in a comment from 2007 in the php sockets documentation and in this tutorial on stuporglue (search for "good parenting"), one can simply "ignore" signals comming in from the child processes (SIGCHLD
) by passing SIG_IGN
to the pcntl_signal()
function -
pcntl_signal(SIGCHLD, SIG_IGN);
Quoting from that linked blog post :
If we are ignoring SIGCHLD, the child processes will be reaped automatically upon completion.
Believe it or not - I included that pcntl_signal()
line, deleted all the other handlers and things dealing with the children and it worked! There were no more <defunct>
processes left hanging around!
In my case, it really did not interest me to know exactly when a child process died, or who it was, I wasn't interested in them at all - just that they didn't hang around and crash my entire server :P
I know only too well how hard you have to search for a solution to the problem of zombie processes. My concern with potentially having hundreds or thousands of them was (rightly or wrongly as I don't know if this would actualy be a problem) running out of inodes, as all hell can break loose when that happens.
If only the pcntl_fork() manual page linked to posix-setsid() many of us would have discovered the solution was so simple years ago.
http://www.linuxsa.org.au/tips/zombies.html
Zombies are dead processes. You cannot kill the dead. All processes eventually die, and when they do they become zombies. They consume almost no resources, which is to be expected because they are dead! The reason for zombies is so the zombie's parent (process) can retrieve the zombie's exit status and resource usage statistics. The parent signals the operating system that it no longer needs the zombie by using one of the wait() system calls.
When a process dies, its child processes all become children of process number 1, which is the init process. Init is ``always'' waiting for children to die, so that they don't remain as zombies.
If you have zombie processes it means those zombies have not been waited for by their parent (look at PPID displayed by ps -l). You have three choices: Fix the parent process (make it wait); kill the parent; or live with it. Remember that living with it is not so hard because zombies take up little more than one extra line in the output of ps.
Regards your disclaimer - PHP is no better / worse than many other languages for writing a server in. There are some things which are not possible to do (lightweight processes, asynchronuos I/O) but these do not really apply to a forking server. If you're using OO code, then do ensure that you've got the circular reference checking garbage collector enabled.
Once a child process exits, it becomes a zombie until the parent process cleans it up. Your code seems to send a KILL signal to every child on receipt of any signal. It won't clean up the process entries. It will terminate processes which have not called exit. To get the child process reaped correctly you should call waitpid (see also this example on the pcntl_wait manual page).