问题
When using Stormcrawler it is indexing to Elasticsearch, but not the content.
Stormcrawler is up-to-date with 'origin/master' https://github.com/DigitalPebble/storm-crawler.git
Using elasticsearch-5.6.4
crawler-conf.yaml has
indexer.url.fieldname: "url"
indexer.text.fieldname: "content"
indexer.canonical.name: "canonical"
The url and title fields are indexed, but not content.
I have trying to get this working by following Julien's tutorial at: https://www.youtube.com/watch?v=xMCuWpPh-4A
Everything is working, except for the content is not being indexed into Elasticsearch. I feel like this is some small config error, but I have tried many variations with no luck. So, now I seek help.
Thanks.
回答1:
Are you sure that the content is not indexed? The content field is not stored, see ES_IndexInit.sh but it should be indexed. To store it, you can modify the init script and re-run the crawl, you'd then get it back same as the other fields. To test that it is indexed, try querying on it and see how it affects the results.
来源:https://stackoverflow.com/questions/47214019/stormcrawler-not-indexing-content-with-elasticsearch