Stormcrawler not indexing content with Elasticsearch

泄露秘密 提交于 2019-12-08 12:35:30

问题


When using Stormcrawler it is indexing to Elasticsearch, but not the content.

Stormcrawler is up-to-date with 'origin/master' https://github.com/DigitalPebble/storm-crawler.git

Using elasticsearch-5.6.4

crawler-conf.yaml has

indexer.url.fieldname: "url" indexer.text.fieldname: "content" indexer.canonical.name: "canonical"

The url and title fields are indexed, but not content.

I have trying to get this working by following Julien's tutorial at: https://www.youtube.com/watch?v=xMCuWpPh-4A

Everything is working, except for the content is not being indexed into Elasticsearch. I feel like this is some small config error, but I have tried many variations with no luck. So, now I seek help.

Thanks.


回答1:


Are you sure that the content is not indexed? The content field is not stored, see ES_IndexInit.sh but it should be indexed. To store it, you can modify the init script and re-run the crawl, you'd then get it back same as the other fields. To test that it is indexed, try querying on it and see how it affects the results.



来源:https://stackoverflow.com/questions/47214019/stormcrawler-not-indexing-content-with-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!