stormcrawler

Why do I have different document counts in status and index?

荒凉一梦 提交于 2019-12-04 05:55:48
问题 So i'm following the Storm-Crawler-ElasticSearch tutorial and playing around with it. When Kibana is used to search I've noticed that number of hits for index name 'status' is far greater than 'index'. Example: On the top left, you can see that there's 846 hits for 'status' index I assume that means it has crawled through 846 pages. Now with 'index' index , it is shown that there are only 31 hits . I understand that functionallyn index and status are different as status is just responsible

Why do I have different document counts in status and index?

那年仲夏 提交于 2019-12-02 08:50:51
So i'm following the Storm-Crawler-ElasticSearch tutorial and playing around with it. When Kibana is used to search I've noticed that number of hits for index name 'status' is far greater than 'index'. Example: On the top left, you can see that there's 846 hits for 'status' index I assume that means it has crawled through 846 pages. Now with 'index' index , it is shown that there are only 31 hits . I understand that functionallyn index and status are different as status is just responsible for the link meta data. The problem is that it seem that StormCrawler is parsing through many pages and