问题
So I need to crawl this website using python, however I am finding a problem when trying to submit the form. The response I get is the same page with the form and not the result after submitting the form. I tried to use requests library/ mechanize / urllib
.
The code with requests:
url = "http://www.justiceservices.gov.mt/courtservices/Judgements/search.aspx?func=selected"
payload = {'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_date_from':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_date_to':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$dd_court':108,
'ctl00$ContentPlaceHolderMain$search_selected_panel$dd_judiciary':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_litigant1':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_litigant2':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_keywords':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$keywords':'rb_keywords_matching_all',
'ctl00$ContentPlaceHolderMain$search_selected_panel$bt_search':'Search',
'ctl00$ContentPlaceHolderMain$search_selected_panel$result_count_panel$dd_result_count':10}
headers = {'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url,payload,allow_redirects=True)
print r.headers
print r.text
Do I need to post additional data? or my approach is wrong to the type of form. The website uses web-forms.
回答1:
If you look at the requests source, specifically https://github.com/kennethreitz/requests/blob/master/requests/api.py#L80 you'll see that post ignores args. Without having time to test, it would seem likely you need to do:
r = requests.post(url, data=payload, allow_redirects=True
来源:https://stackoverflow.com/questions/22407580/python-submitting-webform-using-requests