问题
I'm trying to scrape content from this page with the following form data:
I need the County:
set to Prince George's and DateOfFilingFrom
set to 01-01-2000
so I do the following:
% scrapy shell
In [1]: from scrapy.http import FormRequest
In [2]: request = FormRequest(url='https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx', formdata={'DateOfFilingFrom': '01-01-2000', 'County:': "Prince George's"})
In [3]: response
In [4]:
But it's not working(response is None) plus, the next page looks like the following which is loaded dynamically, I need to know how to be able to access each of the links shown below with the following inspection(as far as I know this might be done using Splash
however, I'm not sure how to combine a SplashRequest
within a FormRequest
and do it all from within scrapy shell for testing purposes. I need to know what I'm doing wrong and how to render the next page(the one that results from the FormRequest
shown below)
回答1:
The request you're sending is missing a couple of fields, which is probably why you don't get a response back. The fields you fill in also don't correspond to the fields they are expecting in the request. A good way to deal with this is using scrapy's from_response (doc), which can populate some fields for you already based on the information in the form.
For this website the following worked for me (using scrapy shell):
>>> url = "https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx"
>>> fetch(url)
>>> from scrapy import FormRequest
>>> req = FormRequest.from_response(
... response,
... formxpath="//form[@id='form1']", # specify the form on the current page
... formdata={
... 'cboCountyId': '16', # the county you select is converted to a number
... 'DateOfFilingFrom': '01-01-2001',
... 'cboPartyType': 'Decedent',
... 'cmdSearch': 'Search'
... },
... clickdata={'type': 'submit'},
... )
>>> fetch(req)
来源:https://stackoverflow.com/questions/63556512/formrequest-that-renders-js-content-in-scrapy-shell