So i\'m following the Storm-Crawler-ElasticSearch tutorial and playing around with it.
When Kibana is used to search I\'ve noticed that number of hits for index name \'s
Redirections and Fetch Errors are another possible reason for a difference. They exist in the status index but not in content index.
The 'status' index contains the information about all the URLs the crawler either fetched or discovered. This is roughly the equivalent of the crawldb in Nutch.The 'index' index contains the pages that have been fetched, parsed and, well, indexed.
Now if you look at the 'status' field within the status index, you'll find that there are different values indicating whether a URL has been DISCOVERED, FETCHED etc... See WIKI about status stream. The ones marked as DISCOVERED haven't yet been fetched and therefore can't be in the 'index' index. If you filter the content of the status index by status:FETCHED you should see a number comparable to the target index.
The Elasticsearch module in SC contains templates for kibana that allow you to see the breakdown of URLs per status. If you haven't done so already, I'd recommend that you look at the video tutorials on Youtube.
So what I would like to have is the same amount of hits on 'index' too with the content displayed. Instead of just 31.
It will eventually get there, you just need to give time to the crawler to do its job (and do so politely). Bear in mind that a crawler discovers URLs quicker than it fetches them. Before you ask about speed, please read the FAQ.