I think that I will ask very big favor as i struggling with this problem several days. I tried all possible (in my best knowledge) ways and still no result. I am doing somethin
Here's a working example of using Request.from_response
for delta.com
:
from scrapy.item import Item, Field
from scrapy.http import FormRequest
from scrapy.spider import BaseSpider
class DeltaItem(Item):
title = Field()
link = Field()
desc = Field()
class DmozSpider(BaseSpider):
name = "delta"
allowed_domains = ["delta.com"]
start_urls = ["http://www.delta.com"]
def parse(self, response):
yield FormRequest.from_response(response,
formname='flightSearchForm',
formdata={'departureCity[0]': 'JFK',
'destinationCity[0]': 'SFO',
'departureDate[0]': '07.20.2013',
'departureDate[1]': '07.28.2013'},
callback=self.parse1)
def parse1(self, response):
print response.status
You've used wrong spider methods, plus allowed_domains
was incorrectly set.
But, anyway, delta.com
heavily uses dynamic ajax calls for loading the content - here's where your problems start. E.g. response
in parse1
method doesn't contain any search results - instead it contains an html for loading AWAY WE GO. ARRIVING AT YOUR FLIGHTS SOON
page where results are loaded dynamically.
Basically, you should work with your browser developer tools and try to simulate those ajax calls inside your spider or use tools like selenium which uses the real browser (and you can combine it with scrapy
).
See also:
Hope that helps.