extract the information in a div class to a json object (or data frame)

前端 未结 2 1090
一整个雨季
一整个雨季 2021-01-24 03:54

For each row in the table on this page, I would like to click on the ID (e.g. the ID of row 1 is 270516746) and extract/download the information (which does NOT have the same he

相关标签:
2条回答
  • 2021-01-24 04:33

    I do not know if you found the answer but I was talking about the approach where selenium is not required. So you can get the XHR for each peptide to get the details from modal box. Although be careful this is just a rough outline you need put the items in a json dumps or whichever way you like. Here is my approach.

    from bs4 import BeautifulSoup
    import pandas as pd
    import requests
    from xml.etree import ElementTree as et
    import xmltodict
    
    
    url = "http://mahmi.org/explore.php?filterType=&filter=&page=1"
    html = requests.get(url).content
    df_list = pd.read_html(html)
    df = df_list[-1]
    headers = {
        "Connection": "keep-alive",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
    }
    
    pep_ids = df['ID'].tolist()
    #pep_ids = ['270516746','268297434'] ## You can use this first to check output
    
    base_url= 'http://mahmi.org/api/peptides/sourceProteins/'
    for pep_id in pep_ids:
        final_url = base_url + str(pep_id)
        page = requests.get(final_url, headers=headers)
        tree = et.fromstring(page.content)
        for child in tree.iter('*'):
            print(child.tag,child.text)
    
    0 讨论(0)
  • 2021-01-24 04:40

    You dont have to click with the text visible. You can generate generic xpaths like :

    "(//table//td[1])//button[@data-target]"
    

    This will detect all buttons in the first column of the table. So you can go on loop.

    count= len(driver.find_elements_by_xpath("(//table//td[1])//button[@data-target]"))
    for i in range(count):
        driver.find_element_by_xpath("((//table//td[1])//button[@data-target])[" + str(i+1) + "]").click()
        # to get text content from pop up window 
        text = driver.find_element_by_xpath("//div[@class='modal-content']").text
        # then click close 
        driver.find_element_by_xpath("//button[text()='Close']").click()
    
    0 讨论(0)
提交回复
热议问题