发表新帖

发表新帖

Which web crawler for extracting and parsing data from about a thousand of web sites

前端未结

关注

 3  1635

庸人自扰 2021-02-06 15:45

I\'m trying to crawl about a thousand of web sites, from which I\'m interested in the html content only.

Then I transform the HTML into XML to be parsed with Xpath to ex

3条回答

迷失自我 (楼主)

2021-02-06 16:20

I would suggest writing your own using Python with the Scrapy and either lxml or BeautifulSoup packages. You should find a few good tutorials in Google for those. I use Scrapy+lxml at work to spider ~600 websites checking for broken links.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题