Web scraping with Selenium

前端未结

关注

 2  885

I\'m trying to scrape this website for the list of company names, code, industry, sector, mkt cap, etc in the table with selenium. I\'m new to it and have writt

相关标签:

2条回答

灰色年华

2021-01-07 09:29

This is totally do-able. What might be the easiest is to use a 'find_elements' call (note that it's plural) and grab all of the <tr> elements. It will return a list that you can parse using find element (singular) calls on each one in the list, but this time find each element by class.

You may be running into a timing issue. I noticed that the data you are looking for loads VERY slowly. You probably need to wait for that data. The best way to do that will be to check for its existence until it appears, then try to load it. Find elements calls (again, note that I'm using the plural again) will not throw an exception when looking for elements and finding none, it will just return an empty list. This is a decent way to check for the data to appear.

0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2021-01-07 09:39

The results are in an iframe - switch to it and then get the .page_source:

iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)

I would also add a wait for the table to be loaded:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

# locate and switch to the iframe
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)

# wait for the table to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.companyName')))

print(driver.page_source)

0 讨论(0)