What is the correct form of work with cookies in scrapy

问题

I'm very newbie,I am working with scrapy in a web that use cookies, This is a problem for me , because I can obtain data the a web without cookies but obtain the data of a web with cookies is dificult for me. I have this code structure

class mySpider(BaseSpider):
    name='data'
    allowed_domains =[]
    start_urls =["http://...."]

def parse(self, response):
    sel = HtmlXPathSelector(response)
    items = sel.xpath('//*[@id=..............')

    vlrs =[]

    for item in items:
        myItem['img'] = item.xpath('....').extract()
        yield myItem

This is fine, I can obtain fine the data without cookies using this code structure I found it as I can work with cookies, in this url, but I do not understand where I should put this code to then be able to get the data using xpath

I'm testing this code

request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})

but I don't know as I can work or where put this code, I put this code into the function parse, for obtain the data

def parse(self, response):
    request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})

    sel = HtmlXPathSelector(request_with_cookies)
    print request_with_cookies

I try of use XPath with this new url with cookies , for later print this new data scraping I thought it was like working with an url without cookies but when I run this I have a mistake because 'Request' object has no attribute 'body_as_unicode' What would be the proper way to work with these cookies, I'm a little lost Thank you very much.

回答1:

You are very close! The contract for the parse() method is that it yields (or returns an iterable) of Items, Requests, or a mix of both. In your case, all you should have to do is

yield request_with_cookies

and your parse() method will be run again with a Response object produced from requesting that URL with those cookies.

http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse http://doc.scrapy.org/en/latest/topics/request-response.html

来源：https://stackoverflow.com/questions/23279256/what-is-the-correct-form-of-work-with-cookies-in-scrapy

标签

python

xpath

scrapy

scrapy-spider