I am currently trying to implement a job queue in php. The queue will then be processed as a batch job and should be able to process some jobs in parallel.
I already
i use exec()
. Its easy and clean. You basically need to build a thread manager, and thread scripts, that will do what you need.
I dont like fsockopen()
because it will open a server connection, that will build up and may hit the apache's connection limit
I dont like curl
functions for the same reason
I dont like pnctl
because it needs the pnctl extension available, and you have to keep track of parent/child relations.
never played with gearman...
The method described in 'Easy parallel processing in PHP' is downright scary - the principle is OK - but the implementation??? As you've already pointed out the curl_multi_ fns provide a much better way of implementing this approach.
But I think those 2 ways will add pretty much overhead
Yes, you probably don't need a client and server HTTP stack for handing off the job - but unless you're working for Google, your development time is much more expensive than your hardware costs - and there are plenty of tools for managing HTTP/analysing performance - and there is a defined standard covering stuff such as status notifications and authentication.
A lot of how you implement the solution depends on the level transactional integrity you require and whether you require in-order processing.
Out of the approaches you mention I'd recommend focussing on the HTTP request method using curl_multi_ . But if you need good transactional control / in order delivery then you should definitely run a broker daemon between the source of the messages and the processing agents (there is a well written single threaded server suitable for use as a framework for the broker here). Note that the processing agents should process a single message at a time.
If you need a highly scalable solution, then take a look at a proper message queuing system such as RabbitMQ.
HTH
C.
If your application is going to run under a unix/linux enviroment I would suggest you go with the forking option. It's basically childs play to get it working. I have used it for a Cron manager and had code for it to revert to a Windows friendly codepath if forking was not an option.
The options of running the entire script several times do, as you state, add quite a bit of overhead. If your script is small it might not be a problem. But you will probably get used to doing parallel processing in PHP by the way you choose to go. And next time when you have a job that uses 200mb of data it might very well be a problem. So you'd be better of learning a way that you can stick with.
I have also tested Gearman and I like it a lot. There are a few thing to think about but as a whole it offers a very good way to distribute works to different servers running different applications written in different languages. Besides setting it up, actually using it from within PHP, or any other language for that matter, is... once again... childs play.
It could very well be overkill for what you need to do. But it will open your eyes to new possibilities when it comes to handling data and jobs, so I would recommend you to try Gearman for that fact alone.
I use PHP's pnctl - it is good as long as you know what you do. I understand you situation but I don't think it's something difficult to understand our code, we just have to be little more conscious than ever when implementing JOB queue or Parallel process.
I feel as long as you code it perfectly and make sure the flow is perfect off-course you should keep PARALLEL PROCESS in mind when you implement.
Where you could do mistakes:
Take a look at this example - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php.
Hope it helps.
Here's a summary of a few options for parallel processing in PHP.
Checkout Amp - Asynchronous concurrency made simple - this looks to be the most mature PHP library I've seen for parallel processing.
This class was posted in the comments of PHP's exec() function and provides a real simple starting point for forking new processes and keeping track of them.
Example:
// You may use status(), start(), and stop(). notice that start() method gets called automatically one time.
$process = new Process('ls -al');
// or if you got the pid, however here only the status() metod will work.
$process = new Process();
$process.setPid(my_pid);
// Then you can start/stop/check status of the job.
$process.stop();
$process.start();
if ($process.status()) {
echo "The process is currently running";
} else {
echo "The process is not running.";
}
There's also a great article Async processing or multitasking in PHP that explains the pros and cons of various approaches:
Then, there's also this simple tutorial which was wrapped up into a little library called Doorman.
Hope these links provide a useful starting point for more research.
Well I guess we have 3 options there:
A. Multi-Thread:
PHP does not support multithread natively. But there is one PHP extension (experimental) called pthreads (https://github.com/krakjoe/pthreads) that allows you to do just that.
B. Multi-Process:
This can be done in 3 ways:
C. Distributed Parallel Processing:
How it works:
Client
App sends data (AKA message) “can be JSON formatted” to the Engine (MQ Engine) “can be local or external a web service”MQ Engine
stores the data “mostly in Memory and optionally in Database” inside a queues (you can define the queue name)Client
App asks the MQ Engine for a data (message) to be processed them in order (FIFO or based on priority) “you can also request data from specific queue".Some MQ Engines:
More of them can be foun here: http://queues.io