Nutch 1.13 index-links configuration

前端 未结 1 853
借酒劲吻你
借酒劲吻你 2021-01-28 07:14

I am currently trying to extract the webgraph structure during my crawling run with Apache Nutch 1.13 and Solr 4.10.4.

According to the documentation, the index-links pl

相关标签:
1条回答
  • 2021-01-28 07:28

    You have to specify the fields in the solrindex-mapping.xml like this

    <field dest="inlinks" source="inlinks"/>
    <field dest="outlinks" source="outlinks"/>
    

    Afterwards, make sure to unload and reload the collection, including a complete restart of Solr.

    You did not specify how exactly you implemented the fields in schema.xml, but for me the following worked:

    <!-- fields for index-links plugin -->
    <field name="inlinks" type="url" stored="true" indexed="false" multiValued="true"/>
    <field name="outlinks" type="url" stored="true" indexed="false" multiValued="true"/>
    

    Best regards and good luck!

    0 讨论(0)
提交回复
热议问题