Using Nutch to crawl a specified URL list

后端未结

关注

 2  1731

星月不相逢 2021-01-16 06:32

I have one million URL list to fetch. I use this list as nutch seeds and use the basic crawl command of Nutch to fetch them. However, I find that Nutch auto

2条回答

礼貌的吻别 (楼主)

2021-01-16 06:52

Set this property in nutch-site.xml. (by default its true so it adds outlinks to the crawldb)


  db.update.additions.allowed
  false
  If true, updatedb will add newly discovered URLs, if false
  only already existing URLs in the CrawlDb will be updated and no new
  URLs will be added.

0 讨论(0)

查看其它2个回答