I have one million URL list to fetch. I use this list as nutch seeds and use the basic crawl command of Nutch to fetch them. However, I find that Nutch auto
Set this property in nutch-site.xml. (by default its true so it adds outlinks to the crawldb)
db.update.additions.allowedfalseIf true, updatedb will add newly discovered URLs, if false
only already existing URLs in the CrawlDb will be updated and no new
URLs will be added.