Beautiful Soup not waiting until page is fully loaded

后端 未结 2 2044
醉话见心
醉话见心 2021-01-07 14:52

So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn\'t waiting until the entire webpage is render

相关标签:
2条回答
  • 2021-01-07 15:07

    I'm happy with requests_html library. It will render Dynamic HTML for you. And is much simpler to implement than Selenium.

    from requests_html import HTMLSession
    import pyppdf.patch_pyppeteer
    from bs4 import BeautifulSoup
    
    url = 'https://xxxxx.com/properties/?sort=latest'
    
    session = HTMLSession()
    
    
    resp = session.get(link)
    resp.html.render()
    html = resp.html.html
    
    page_soup = BeautifulSoup(html, 'html.parser')
    
    containers = page_soup.find_all("div", {"class": "grid-item"})
    
    0 讨论(0)
  • 2021-01-07 15:22

    If you want to wait for the page to fully load its data you should think about using selenium, in your case it could look like this:

    from bs4 import BeautifulSoup
    from selenium.webdriver import Chrome
    from selenium.webdriver.chrome.options import Options
    
    url = "<URL>"
    
    chrome_options = Options()  
    chrome_options.add_argument("--headless") # Opens the browser up in background
    
    with Chrome(options=chrome_options) as browser:
         browser.get(url)
         html = browser.page_source
    
    page_soup = BeautifulSoup(html, 'html.parser')
    containers = page_soup.findAll("div",{"class":"grid-item"})
    
    0 讨论(0)
提交回复
热议问题