Selenium Web Scraping With Beautiful Soup on Dynamic Content and Hidden Data Table

前端 未结 1 1157
醉酒成梦
醉酒成梦 2021-01-15 08:11

Really need help from this community!

I am doing web scraping on Dynamic Content in Python by using Selenium and Beautiful Soup. The thing is the pricing data table c

相关标签:
1条回答
  • 2021-01-15 08:48

    You should target the element after has loaded and take arguments[0] and not the entire page via document

    html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
    sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
    

    This has 2 practical cases:

    1

    the element is not yet loaded in the DOM and you need to wait for the element:

    browser.get("url")
    sleep(experimental) # usually get will finish only after the page is loaded but sometimes there is some JS woo running after on load time
    
    try:
        element= WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'your_id_of_interest')))
        print "element is ready do the thing!"
        html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
        sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
    except TimeoutException:
        print "Somethings wrong!"   
    

    2

    the element is in a shadow root and you need to expand first the shadow root, probably not your situation but I will mention it here since it is relevant for future reference. ex:

    import selenium
    from selenium import webdriver
    driver = webdriver.Chrome()
    from bs4 import BeautifulSoup
    
    
    def expand_shadow_element(element):
      shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
      return shadow_root
    
    driver.get("chrome://settings")
    root1 = driver.find_element_by_tag_name('settings-ui')
    
    html_of_interest=driver.execute_script('return arguments[0].innerHTML',root1)
    sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
    sel_soup# empty root not expande
    
    shadow_root1 = expand_shadow_element(root1)
    
    html_of_interest=driver.execute_script('return arguments[0].innerHTML',shadow_root1)
    sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
    sel_soup
    

    0 讨论(0)
提交回复
热议问题