问题
I am struggling with this for the whole day. I need to scrape a data from a website which has a button where you need to click in order to see the data. Button itself has call to this famous __dopostback() javascript function that is used by ASP.NET websites
<a id="ContentPlaceHolder1_lbCoach" class="btn btn-dark-blue" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$lbCoach','')"><i class="fa fa-eye"></i> Display HS Coach Info</a>
As this answer suggests, I should mimic the behavior of post request and I should get the data back and I did just that with the following:
VIEWSTATE = soup.find('input',{'id':'__VIEWSTATE'}).get('value')
EVENTVALIDATION = soup.find('input',{'id':'__EVENTVALIDATION'}).get('value')
headers = {'Cache-Control': 'no-cache',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Referer': contact_url,
'X-MicrosoftAjax': 'Delta=true'}
payload = {"ctl00$ToolkitScriptManager2":"ctl00$ContentPlaceHolder1$updCoach|ctl00$ContentPlaceHolder1$lbCoach",
"ToolkitScriptManager2_HiddenField":"",
"ctl00$Header1$Menu1$txtSearchBox": "",
"ctl00$Header1$Menu1$txtSearchBox2": "",
"__EVENTTARGET":"ctl00$ContentPlaceHolder1$lbDisplayContact",
"__EVENTARGUMENT":"",
"__VIEWSTATE":VIEWSTATE,
"__SCROLLPOSITIONX":"0",
"__SCROLLPOSITIONY":"0",
"__EVENTVALIDATION":EVENTVALIDATION,
"__ASYNCPOST": "true",
}
r = s.post(contact_url,headers = headers, data=payload)
page_content = r.content.decode()
soup = BeautifulSoup(page_content, "html.parser")
The response seem to be fine, but what I get is nothing special:
b'1|#||4|40|updatePanel|ContentPlaceHolder1_Bio1_udpAdminMenu|\r\n \r\n |0|hiddenField|__EVENTTARGET||0|hiddenField|__EVENTARGUMENT||16992|hiddenField|__VIEWSTATE||1|hiddenField|__SCROLLPOSITIONX|0|1|hiddenField|__SCROLLPOSITIONY|0|292|hiddenField|__EVENTVALIDATION|/wEdAAxsD18kXuyPL5ofgcnYES9y+7zziCikaDB50o6O1pxxXbDWcw39S27yDoDwzfIvSl/82S52cVbB2NeFUXKE4Mx+O+TegoiNwQAdWnT22jPmzI4v73G0IN877PxHm4GlN3cV9hFWoAb20O4Q+9Ls96AskeglIWLjtf4N+HDDRWBUXzFl5Dm8D+CLbHmC0vzJAV2dMNOfX5+XKgQp7nrLXr1R1UFtN09quhqZEMqLAngnkseO4VALrQwmvGPQfIrd43K9AvIrswshyn58y8V7WKC8hka6Yg==|0|asyncPostBackControlIDs|||0|postBackControlIDs|||285|updatePanelIDs||tctl00$ContentPlaceHolder1$Bio1$udpAdminMenu,ContentPlaceHolder1_Bio1_udpAdminMenu,tctl00$ContentPlaceHolder1$udpAddress,ContentPlaceHolder1_udpAddress,tctl00$ContentPlaceHolder1$updCoach,ContentPlaceHolder1_updCoach,tctl00$ContentPlaceHolder1$updDetails,ContentPlaceHolder1_updDetails|0|childUpdatePanelIDs|||81|panelsToRefreshIDs||ctl00$ContentPlaceHolder1$Bio1$udpAdminMenu,ContentPlaceHolder1_Bio1_udpAdminMenu|2|asyncPostBackTimeout||90|48|formAction||./PlayerProfile_ContactInfo.aspx?ID=J34665D097ED|'
When I use Fiddler, both requests and responses, the one after clicking the actual button and the one from code, seems to be the same.
Requests data
Response data
And the most interesting part, the same request, looked through Chrome Dev tools renders normally and in place of \r\n \r\n
from the prevous response, now you can see the whole html, with all additional data
Is it possible, that I am actually getting data, but don't know how to render it?
来源:https://stackoverflow.com/questions/42032932/python-requests-and-dopostback-function