Which web crawler for extracting and parsing data from about a thousand of web sites

前端 未结 3 1635
庸人自扰
庸人自扰 2021-02-06 15:45

I\'m trying to crawl about a thousand of web sites, from which I\'m interested in the html content only.

Then I transform the HTML into XML to be parsed with Xpath to ex

3条回答
  •  迷失自我
    2021-02-06 16:20

    I would suggest writing your own using Python with the Scrapy and either lxml or BeautifulSoup packages. You should find a few good tutorials in Google for those. I use Scrapy+lxml at work to spider ~600 websites checking for broken links.

提交回复
热议问题