extract the information in a div class to a json object (or data frame)

前端未结

关注

 2  1093

For each row in the table on this page, I would like to click on the ID (e.g. the ID of row 1 is 270516746) and extract/download the information (which does NOT have the same he

相关标签:

2条回答

萌比男神i

2021-01-24 04:33

I do not know if you found the answer but I was talking about the approach where selenium is not required. So you can get the XHR for each peptide to get the details from modal box. Although be careful this is just a rough outline you need put the items in a json dumps or whichever way you like. Here is my approach.

from bs4 import BeautifulSoup
import pandas as pd
import requests
from xml.etree import ElementTree as et
import xmltodict


url = "http://mahmi.org/explore.php?filterType=&filter=&page=1"
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
headers = {
    "Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}

pep_ids = df['ID'].tolist()
#pep_ids = ['270516746','268297434'] ## You can use this first to check output

base_url= 'http://mahmi.org/api/peptides/sourceProteins/'
for pep_id in pep_ids:
    final_url = base_url + str(pep_id)
    page = requests.get(final_url, headers=headers)
    tree = et.fromstring(page.content)
    for child in tree.iter('*'):
        print(child.tag,child.text)

0 讨论(0)

臣服心动

2021-01-24 04:40

You dont have to click with the text visible. You can generate generic xpaths like :

"(//table//td[1])//button[@data-target]"

This will detect all buttons in the first column of the table. So you can go on loop.

count= len(driver.find_elements_by_xpath("(//table//td[1])//button[@data-target]"))
for i in range(count):
    driver.find_element_by_xpath("((//table//td[1])//button[@data-target])[" + str(i+1) + "]").click()
    # to get text content from pop up window 
    text = driver.find_element_by_xpath("//div[@class='modal-content']").text
    # then click close 
    driver.find_element_by_xpath("//button[text()='Close']").click()

0 讨论(0)