How to “webscrape” a site containing a popup window, using python?

前端 未结 2 1741
逝去的感伤
逝去的感伤 2021-01-16 10:14

I am trying to web scrape a certain part of the etherscan site with python, since there is no api for this functionality. Basically going to this link and one would need to

相关标签:
2条回答
  • 2021-01-16 10:31

    I disagree with @InfinityTM. Usually the workflow that is follow for this kind of problems is that you will need to make a POST request into the website.

    Look, if you click on Verify a POST request is made into the website as shown in this image:

    This POST request was made with this headers:

    and with this params:

    You need to figure out how to send this POST request with the correct URL, headers, params, and cookies. Once you have achieved to make the request, you will receive the response:

    which contains the information you want to scrap under the div with class "alert alert-success:

    Summary

    So the steps you need to follow are:

    1. Navigate to your website, and gather all the information (request URL, Cookies, headers, and params) that you will need for your POST request.
    2. Make the request with the requests library.
    3. Once you get a <200> response, scrap the data you are interested in with BS.

    Please let me know if this points you in the right direction! :D

    0 讨论(0)
  • 2021-01-16 10:36
    import requests
    from bs4 import BeautifulSoup
    
    
    def Main(url):
        with requests.Session() as req:
            r = req.get(url, headers={'User-Agent': 'Ahmed American :)'})
            soup = BeautifulSoup(r.content, 'html.parser')
            vs = soup.find("input", id="__VIEWSTATE").get("value")
            vsg = soup.find("input", id="__VIEWSTATEGENERATOR").get("value")
            ev = soup.find("input", id="__EVENTVALIDATION").get("value")
            data = {
                '__VIEWSTATE': vs,
                '__VIEWSTATEGENERATOR': vsg,
                '__EVENTVALIDATION': ev,
                'ctl00$ContentPlaceHolder1$txtContractAddress': '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48',
                'ctl00$ContentPlaceHolder1$btnSubmit': "Verify"
            }
            r = req.post(
                "https://etherscan.io/proxyContractChecker?a=0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48", data=data, headers={'User-Agent': 'Ahmed American :)'})
            soup = BeautifulSoup(r.content, 'html.parser')
            token = soup.find(
                "div", class_="alert alert-success").text.split(" ")[-1]
            print(token)
    
    
    Main("https://etherscan.io/proxyContractChecker")
    

    Output:

    0x0882477e7895bdc5cea7cb1552ed914ab157fe56
    
    0 讨论(0)
提交回复
热议问题