问题
We're working on a SEO related script in PHP, and we need to run different modules (each one of them are a file .php) at the same time once we finish with the crawling process. In other words, we need to execute more than 10 .php files, in parallel.
The application used to work with a sequence, so once when one script was ending, the user's browser was forwarded into the next one. Each one of the scripts is establishing a connection to the database, and sending different HTTP packets to the crawled web application.
I understand that this could be approached using popen? Is there any way to receive information from each one of this modules into the main script that triggers them? Could anyone provide a very short snippet to see how this would work?
回答1:
try this technique for running multiple parallel jobs in PHP. In this example, we have two job files: j1.php and j2.php we want to run. The sample jobs don't do anything fancy. The file j1.php looks like this:
$jobname = 'j1';
set_time_limit(0);
$secs = 60;
while ($secs) {
echo $jobname,'::',$secs,"\n";
flush(); @ob_flush(); ## make sure that all output is sent in real-time
$secs -= 1;
$t = time();
sleep(1); // pause
}
The reason why we flush(); @ob_flush(); is that when we echo or print, the strings are sometimes buffered by PHP and not sent until later. These two functions ensure that all data is sent immediately.
We then have a 3rd file, control.php, which does the coordination of jobs j1 and j2. This script will call j1.php and j2.php asynchronously using fsockopen in JobStartAsync(), so we are able to run j1.php and j2.php in parallel. The output from j1.php and j2.php are returned to control.php using JobPollAsync().
#
# control.php
#
function JobStartAsync($server, $url, $port=80,$conn_timeout=30, $rw_timeout=86400)
{
$errno = '';
$errstr = '';
set_time_limit(0);
$fp = fsockopen($server, $port, $errno, $errstr, $conn_timeout);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
return false;
}
$out = "GET $url HTTP/1.1\r\n";
$out .= "Host: $server\r\n";
$out .= "Connection: Close\r\n\r\n";
stream_set_blocking($fp, false);
stream_set_timeout($fp, $rw_timeout);
fwrite($fp, $out);
return $fp;
}
// returns false if HTTP disconnect (EOF), or a string (could be empty string) if still connected
function JobPollAsync(&$fp)
{
if ($fp === false) return false;
if (feof($fp)) {
fclose($fp);
$fp = false;
return false;
}
return fread($fp, 10000);
}
###########################################################################################
if (1) { /* SAMPLE USAGE BELOW */
$fp1 = JobStartAsync('localhost','/jobs/j1.php');
$fp2 = JobStartAsync('localhost','/jobs/j2.php');
while (true) {
sleep(1);
$r1 = JobPollAsync($fp1);
$r2 = JobPollAsync($fp2);
if ($r1 === false && $r2 === false) break;
echo "<b>r1 = </b>$r1<br>";
echo "<b>r2 = </b>$r2<hr>";
flush(); @ob_flush();
}
echo "<h3>Jobs Complete</h3>";
}
Good Read
Divide-and-conquer and parallel processing in PHP
from the source
回答2:
If the various files in PHP have no dependency, I think you can use a multi-curl approach which can be implemented as shown :-
$linkArray = array('file1.php', 'file2.php','file3.php','file4.php','file5.php');
$nodes = ($linkArray);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
$counter = 0;
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
for($k=0;$k<$node_count;$k++){
$result = curl_multi_getcontent ($curl_arr[$k]); // contains the output of individual files
}
来源:https://stackoverflow.com/questions/13069900/running-several-php-processes-in-parallel