问题
I am trying to use Python to web scrape a website that loads it's HTML dynamically by using embedded javascript files that render the data as a Response into the HTML. Therefore, if I use BeautifulSoup alone, I will not be able to retrieve that data that I need as my program will scrape it before the Javascript loads the data. Due to this, I am integrating the selenium library into my code, to make my program wait until a certain element is found before it scrapes the website.
I had originally done this:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.ID, "tabla_evolucion")))
But I want to specify a class instead by doing something like:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))
Here is the rest of my code:
driver_path = 'C:/webDrivers/chromedriver.exe'
driver = webdriver.Chrome(executable_path=driver_path)
driver.header_overrides = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
url = "myurlthatIamscraping.com"
response = driver.get(url)
html = driver.page_source
characters = len(html)
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))
print(html)
print(characters)
time.sleep(10)
driver.quit()
It is not working for me and I can not find the right syntax anywhere.
回答1:
The relevant HTML would have helped us to construct a more canonical answer. However to start with your first line of code:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.ID, "tabla_evolucion")))
is pretty much legitimate where as the second line of code:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))
Will raise an error as:
Message: invalid selector: Compound class names not permitted
as you can't pass multiple classes through By.class
.
You can find a detailed discussion in Invalid selector: Compound class names not permitted using find_element_by_class_name with Webdriver and Python
Solution
You need to take care of a couple of things as follows:
- Without any visibility to your usecase, functionally inducing WebDriverWait in association with EC as
presence_of_element_located()
merely confirms the presence of the element within the DOM Tree. Presumably moving ahead either you need to get the attributes e.g.value
,innerText
, etc or you would interact with the element. So instead ofpresence_of_element_located()
you need to use eithervisibility_of_element_located()
orelement_to_be_clickable()
You can find a detailed discussion in WebDriverWait not working as expected
For an optimum result you can club up the
ID
andCLASS
attributes and you can use either of the following Locator Strategies:Using
CSS_SELECTOR
:element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".ng-binding.ng-scope#tabla_evolucion")))
Using
XPATH
:element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[@class='ng-binding ng-scope' and @id='tabla_evolucion']")))
回答2:
It's in the docs.
Set of supported locator strategies.
CLASS_NAME = 'class name'
CSS_SELECTOR = 'css selector'
ID = 'id'
LINK_TEXT = 'link text'
NAME = 'name'
PARTIAL_LINK_TEXT = 'partial link text'
TAG_NAME = 'tag name'
XPATH = 'xpath'
Note: What you have in your code is not a class, it's two classes. That won't work if you use By.CLASS_NAME()
because it expects only a single class. What you want instead is a CSS selector
EC.presence_of_element_located((By.CSS_SELECTOR, ".ng-binding.ng-scope")))
In CSS selector syntax, a .
indicates a class. See the W3C docs for more info on the CSS selector syntax.
来源:https://stackoverflow.com/questions/57262217/how-do-you-use-ec-presence-of-element-locatedby-id-mydynamicelement-excep