Run StormCrawler in local mode or install Apache Storm?

问题

So I'm trying to figure out how to install and setup Storm/Stormcrawler with ES and Kibana as described here.

I never installed Storm on my local machine because I've worked with Nutch before and I never had to install Hadoop locally... thought it might be the same with Storm(maybe not?).

I'd like to start crawling with Stormcrawler instead of Nutch now.

It seems that if I just download a release and add the /bin to my PATH, I can only talk to a remote cluster.

It seems like I need to setup a development environment according to this, to give me the ability to develop different topologies over time and then just talk to the remote cluster from my local machine when ready to deploy the new topologies. Is that right?

So it seems like all I need to do is add Storm as a dependency to my Stormcrawler project when I build it with Maven?

回答1:

See Getting Started page and the tutorials on Youtube.

You don't need to install Storm as you can run the topology in local mode, just as you'd do with Nutch and Hadoop. Just generate a topology from the archetype, modify it to your needs e.g. add ES components and run it with -local. See README generated by the archetype.

Later on, you'd install Storm to benefit from the UI and possibly run it on multiple nodes but as a starting point doing it locally is a good way of exploring the capabilities of StormCrawler.

来源：https://stackoverflow.com/questions/51994601/run-stormcrawler-in-local-mode-or-install-apache-storm

标签

web-crawler

apache-storm

stormcrawler