For each row in the table on this page, I would like to click on the ID (e.g. the ID of row 1 is 270516746) and extract/download the information (which does NOT have the same he
I do not know if you found the answer but I was talking about the approach where selenium is not required. So you can get the XHR for each peptide to get the details from modal box. Although be careful this is just a rough outline you need put the items in a json dumps or whichever way you like. Here is my approach.
from bs4 import BeautifulSoup
import pandas as pd
import requests
from xml.etree import ElementTree as et
import xmltodict
url = "http://mahmi.org/explore.php?filterType=&filter=&page=1"
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
headers = {
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
pep_ids = df['ID'].tolist()
#pep_ids = ['270516746','268297434'] ## You can use this first to check output
base_url= 'http://mahmi.org/api/peptides/sourceProteins/'
for pep_id in pep_ids:
final_url = base_url + str(pep_id)
page = requests.get(final_url, headers=headers)
tree = et.fromstring(page.content)
for child in tree.iter('*'):
print(child.tag,child.text)
You dont have to click with the text visible. You can generate generic xpaths like :
"(//table//td[1])//button[@data-target]"
This will detect all buttons in the first column of the table. So you can go on loop.
count= len(driver.find_elements_by_xpath("(//table//td[1])//button[@data-target]"))
for i in range(count):
driver.find_element_by_xpath("((//table//td[1])//button[@data-target])[" + str(i+1) + "]").click()
# to get text content from pop up window
text = driver.find_element_by_xpath("//div[@class='modal-content']").text
# then click close
driver.find_element_by_xpath("//button[text()='Close']").click()