crawl URLs based on their priorities in StormCrawler
问题 I am working on a crawler based on the StormCrawler project. I have a requirement to crawl URLs based on their priorities. For example, I have two types of priority: HIGH, LOW. I want to crawl HIGH priority URLs as soon as possible before LOW URLs. I need a method for handling the above problem in the crawler. How can I handle this requirement in Apache Storm and StormCrawler? 回答1: With Elasticsearch as a backend, you can configure the spouts to sort the URLs within a bucket by whichever