Beautiful Soup not waiting until page is fully loaded

后端未结

关注

 2  2044

So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn\'t waiting until the entire webpage is render

相关标签:

2条回答

死守一世寂寞

2021-01-07 15:07

I'm happy with requests_html library. It will render Dynamic HTML for you. And is much simpler to implement than Selenium.

from requests_html import HTMLSession
import pyppdf.patch_pyppeteer
from bs4 import BeautifulSoup

url = 'https://xxxxx.com/properties/?sort=latest'

session = HTMLSession()


resp = session.get(link)
resp.html.render()
html = resp.html.html

page_soup = BeautifulSoup(html, 'html.parser')

containers = page_soup.find_all("div", {"class": "grid-item"})

0 讨论(0)

不思量自难忘°

2021-01-07 15:22

If you want to wait for the page to fully load its data you should think about using selenium, in your case it could look like this:

from bs4 import BeautifulSoup
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

url = "<URL>"

chrome_options = Options()  
chrome_options.add_argument("--headless") # Opens the browser up in background

with Chrome(options=chrome_options) as browser:
     browser.get(url)
     html = browser.page_source

page_soup = BeautifulSoup(html, 'html.parser')
containers = page_soup.findAll("div",{"class":"grid-item"})

0 讨论(0)