Scrapy Not Returning Additonal Info from Scraped Link in Item via Request Callback

后端 未结 3 560
别跟我提以往
别跟我提以往 2021-01-24 03:09

Basically the code below scrapes the first 5 items of a table. One of the fields is another href and clicking on that href provides more info which I want to collect and add to

3条回答
  •  情话喂你
    2021-01-24 03:30

    Install pyOpenSSL , sometimes fiddler also creates problem for "https:\*" requests. Close fiddler if running and run spider again. Another problem which is in your code that you are using a generator in parse method and not using 'yeild' to return the request to scrapy scheduler. You should do it like this....

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        items = []
    
    for x in range (1,6):
        item = ScrapyItem()
        str_selector = '//tr[@name="row{0}"]'.format(x)
        item['thing1'] = hxs.select(str_selector")]/a/text()').extract()
        item['thing2'] = hxs.select(str_selector")]/a/@href').extract()
        print 'hello'
        request = Request("www.nextpage.com",callback=self.parse_next_page,meta{'item':item})
        if request:
             yield request
        else:
             yield item
    

提交回复
热议问题