Scrapy Tutorial Example

前端 未结 2 1504
深忆病人
深忆病人 2021-01-24 07:33

Looking to see if someone can point me in the right direction in regards to using Scrapy in python.

I\'ve been trying to follow the example for several days and still ca

2条回答
  •  不思量自难忘°
    2021-01-24 07:48

    Seems like this spider is outdated in the tutorial. The website has changed a bit so all of the xpaths now capture nothing. This is easily fixable:

    def parse(self, response):
        sites = response.xpath('//div[@class="title-and-desc"]/a')
        for site in sites:
            item = dict()
            item['name'] = site.xpath("text()").extract_first() 
            item['url'] = site.xpath("@href").extract_first() 
            item['description'] = site.xpath("following-sibling::div/text()").extract_first('').strip()
            yield item
    

    For future reference you can always test whether a specific xpath works with scrapy shell command.
    e.g. what I did to test this:

    $ scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"
    # test sites xpath
    response.xpath('//ul[@class="directory-url"]/li') 
    []
    # ok it doesn't work, check out page in web browser
    view(response)
    # find correct xpath and test that:
    response.xpath('//div[@class="title-and-desc"]/a')
    # 21 result nodes printed
    # it works!
    

提交回复
热议问题