问题
I'm very newbie,I am working with scrapy in a web that use cookies, This is a problem for me , because I can obtain data the a web without cookies but obtain the data of a web with cookies is dificult for me. I have this code structure
class mySpider(BaseSpider):
name='data'
allowed_domains =[]
start_urls =["http://...."]
def parse(self, response):
sel = HtmlXPathSelector(response)
items = sel.xpath('//*[@id=..............')
vlrs =[]
for item in items:
myItem['img'] = item.xpath('....').extract()
yield myItem
This is fine, I can obtain fine the data without cookies using this code structure I found it as I can work with cookies, in this url, but I do not understand where I should put this code to then be able to get the data using xpath
I'm testing this code
request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})
but I don't know as I can work or where put this code, I put this code into the function parse, for obtain the data
def parse(self, response):
request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})
sel = HtmlXPathSelector(request_with_cookies)
print request_with_cookies
I try of use XPath with this new url with cookies , for later print this new data scraping I thought it was like working with an url without cookies but when I run this I have a mistake because 'Request' object has no attribute 'body_as_unicode' What would be the proper way to work with these cookies, I'm a little lost Thank you very much.
回答1:
You are very close!
The contract for the parse() method is that it yield
s (or returns an iterable) of Item
s, Request
s, or a mix of both. In your case, all you should have to do is
yield request_with_cookies
and your parse() method will be run again with a Response
object produced from requesting that URL with those cookies.
http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse http://doc.scrapy.org/en/latest/topics/request-response.html
来源:https://stackoverflow.com/questions/23279256/what-is-the-correct-form-of-work-with-cookies-in-scrapy