nodejs job server (multiple purpose)

流过昼夜 提交于 2019-12-07 14:33:46

问题


I'm fairly new and just getting to know node.js (background as PHP developer). I've seen some nodeJs examples and the video on nodejs website.

Currently I'm running a video site and in the background a lot of tasks have to be executed. Currently this is done by cronjobs that call php scripts. The downsite of this approach is when an other process is started when the previous is still working you get a high load on the servers etc.

The jobs that needs to be done on the server are the following:

  • Scrape feeds from websites and insert them in mysql database
  • Fetch data from websites (scraping) (upon request)
  • Generate data for reporting. These are mostly mysql queries that need to be executed.

Tasks that need to be done in the future

  • Log video views (when a user visits a video page) (this will also be logged to mysql)
  • Log visitors in general
  • Show ads based on searched video

I want to be able to call an url so that a job can be queued and also be able to schedule jobs by time or they can run constantly.

I don't know if node.js is the path to follow that's why I'm asking it here. What are the advantages of doing this in node? The downsites?

What are the pro's here with node.js?

Thanks for the response!


回答1:


While traditionally used for web/network tasks (web servers, IRC chat servers, etc.), Node.js shines when you give it any kind of IO bound (as opposed to CPU bound) task, since it uses completely asynchronous IO (that is, all IO happens outside of the main event loop). For example, Node can easily hold open many sockets, waiting for data on each, or stream data to and from files very efficiently.

It really sounds like you're just looking for a job queue; a popular one is Resque, and though it's written for Ruby, there are versions for PHP, Node.js, and more. There are also job queues built specifically for PHP; if you want to stick to PHP, a Google search for "PHP job queue" make take you far.

Now, one advantage to using Node.js is, again, its ability to handle a lot of IO very easily. Of course I'm just guessing, but based on your requirements, it could be a good tool for the job:

  • Scrape data/feeds from websites - mostly waiting on network IO
  • Insert data into MySQL - mostly waiting on network IO
  • Reporting - again, Node is good at MySQL queries, but probably not so good at analyzing data
  • Call a URL to schedule a job - Node's built-in HTTP handling and excellent web libraries make this a cinch

So it's entirely possible you may want to experiment with Node for these tasks. If you do, take a look at Resque for Node or another job system like Kue. It's also not very hard to build your own, if you don't need something complicated--Redis is a good tool for this.

There are a few reasons you might not want to use Node. If you're not familiar with JavaScript and evented and continuation-passing style programming, Node.js may have a bit of a learning curve, as you have to force yourself to stop thinking synchronously. Furthermore, if you do have a lot of heavy non-IO tasks in your program, such as analyzing data, Node will not excel as those calculations will block the main event loop and keep Node from handling callbacks, etc. for your asynchronous IO. Finally, if you have a lot of logic already in PHP or another language, it may be easier and/or quicker to find a solution in your language of choice.




回答2:


I second the above answers. You don't necessarily need a full-service job queue, however: you can use flow-control modules like async to run tasks in parallel or series, as fast as they'll go or with controlled concurrency. Node.js has many powerful scraping/parsing tools. This post mentions a few; I just heard about Trumpet recently; there are probably dozens of options. Node.js has a Stream module in core and Request makes HTTP interactions extremely easy. For timed tasks, the simplest approach is a basic setTimeout/setInterval. Or you could write the scraper as a script that's called on cron. Or have it triggered on some event using the EventEmitter module in core. etc...




回答3:


Uncontrolled amount of node js parallel jobs may lay down your server. You will need to control processes or in better way put them in queue for each task

For this needs and if you know php I suggest to use gearman and add jobs by needs or by small php scripts



来源:https://stackoverflow.com/questions/10766382/nodejs-job-server-multiple-purpose

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!