I took the most basic demo of pthreads PHP7 extension that uses Pool
class (this demo https://github.com/krakjoe/pthreads#polyfill) and extended it a little so I ca
As you have quite correctly noted, the code you have copied targets pthreads v2 (for PHP 5.x).
The problem boils down to the fact that the garbage collector in pthreads is not deterministic. This means it will not behave predictably, and so it cannot be reliably used in order to fetch data from the tasks that have been executed by the pool.
One way you could fetch this data would be to pass in Threaded
objects into the tasks being submitted to the pool:
<?php
$pool = new Pool(4);
$data = [];
foreach (range(1, 8) as $i) {
$dataN = new Threaded();
$dataN->i = $i;
$data[] = $dataN;
$pool->submit(new class($dataN) extends Threaded {
public $data;
public function __construct($data)
{
$this->data = $data;
}
public function run()
{
echo "Hello World\n";
$this->data->i *= 2;
}
});
}
while ($pool->collect());
$pool->shutdown();
foreach ($data as $dataN) {
var_dump($dataN->i);
}
There are a few things to note about the above code:
Collectable
(which is now an interface in pthreads v3) is implemented by the Threaded
class already, so there's no need to implement it yourself.collect
method (in a loop that blocks the main thread until all tasks have finished executing) so that the tasks can be garbage collected (using pthreads' default collector) to free up memory whilst the pool is executing tasks.I had a similar problem, where the collecting would return true instantly. Turns out that collect
would return when all work was in process
and not when all work was completed. It wouldn't even handle the task, so collecting
was never returned.
So if I had a poolsize of 4 and submitted just 3 tasks, collect
would never run and we would continue immediately. Example:
define ("CRLF", "\r\n");
class AsyncWork extends Thread {
private $done = false;
private $id;
public function __construct($id) {
$this->id = $id;
}
public function id() {
return $this->id;
}
public function isCompleted() {
return $this->done;
}
public function run() {
echo '[AsyncWork] ' . $this->id . CRLF;
sleep(rand(1,5));
echo '[AsyncWork] sleep done ' . $this->id . CRLF;
$this->done = true;
}
}
$pool = new Pool(4);
for($i=1;$i<=3;$i++) {
$pool->submit(new AsyncWork($i));
}
while ($pool->collect(function(AsyncWork $work){
echo 'Collecting ['.$work->id().']: ' . ($work->isCompleted()?1:0) . CRLF;
return $work->isGarbage();
})) continue;
echo 'ALL DONE' . CRLF;
$pool->shutdown();
would output
[AsyncWork] 1
[AsyncWork] 2
ALL DONE
[AsyncWork] 3
[AsyncWork] sleep done 2
[AsyncWork] sleep done 3
[AsyncWork] sleep done 1
If I changed above code to have more work then the poolsize, it would collect untill all work was in process. EG:
for($i=1;$i<=10;$i++) {
$pool->submit(new AsyncWork($i));
}
//results:
[AsyncWork] 1
[AsyncWork] 2
[AsyncWork] 3
[AsyncWork] 4
[AsyncWork] sleep done 4
[AsyncWork] 8
Collecting [4]: 1
[AsyncWork] sleep done 1
Collecting [1]: 1
[AsyncWork] 5
[AsyncWork] sleep done 3
Collecting [3]: 1
[AsyncWork] 7
[AsyncWork] sleep done 2
Collecting [2]: 1
[AsyncWork] 6
[AsyncWork] sleep done 6
Collecting [6]: 1
[AsyncWork] 10
[AsyncWork] sleep done 7
Collecting [7]: 1
[AsyncWork] sleep done 8
Collecting [8]: 1
[AsyncWork] sleep done 5
Collecting [5]: 1
ALL DONE
[AsyncWork] 9
[AsyncWork] sleep done 9
[AsyncWork] sleep done 10
As you can see, it never collects the last tasks and it returns before the work is done.
The only way I could solve this, was to handle collecting myself, by keeping track of the tasklist.
$pool = new Pool(4);
$worklist = [];
for($i=1;$i<=10;$i++) {
$work = new AsyncWork($i);
$worklist[] = $work;
$pool->submit($work);
}
do {
$alldone = true;
foreach($worklist as $i=>$work) {
if (!$work->isCompleted()) {
$alldone = false;
} else {
echo 'Completed: '. $work->id(). CRLF;
unset($worklist[$i]);
}
}
if ($alldone) {
break;
}
} while(true);
while ($pool->collect(function(AsyncWork $work){
echo 'Collecting ['.$work->id().']: ' . ($work->isCompleted()?1:0) . CRLF;
return $work->isGarbage();
})) continue;
echo 'ALL DONE' . CRLF;
$pool->shutdown();
This was the only way I could make sure ALL DONE
was only called when it was in fact, all done.
[AsyncWork] 1
[AsyncWork] 2
[AsyncWork] 3
[AsyncWork] 4
[AsyncWork] sleep done 1
[AsyncWork] 5
Completed: 1
[AsyncWork] sleep done 2
Completed: 2
[AsyncWork] 6
[AsyncWork] sleep done 4
[AsyncWork] 8
Completed: 4
[AsyncWork] sleep done 6
[AsyncWork] sleep done 3
[AsyncWork] 7
Completed: 6
Completed: 3
[AsyncWork] sleep done 5
Completed: 5
[AsyncWork] 10
[AsyncWork] 9
[AsyncWork] sleep done 9
Completed: 9
[AsyncWork] sleep done 8
Completed: 8
[AsyncWork] sleep done 7
Completed: 7
[AsyncWork] sleep done 10
Completed: 10
Collecting [1]: 1
Collecting [5]: 1
Collecting [9]: 1
Collecting [2]: 1
Collecting [6]: 1
Collecting [10]: 1
Collecting [3]: 1
Collecting [7]: 1
Collecting [4]: 1
Collecting [8]: 1
ALL DONE