Java-Scrape a dynamic website with JSoup

后端 未结 3 1299
-上瘾入骨i
-上瘾入骨i 2020-12-18 11:19

I would like to scrape a website with JSoup. This website is dynamic and updates every second or so. I\'m pretty sure it uses JQuery, which updates some tags in the HTML. I

相关标签:
3条回答
  • 2020-12-18 11:30
    1. Selenium WebDriver to open the page in a real browser
    2. address the element and get its content using Selenium WebDriver API - you can even call JS code in the page's context
    3. parse with JSoup etc.
    0 讨论(0)
  • 2020-12-18 11:34

    HTMLUnit is a java based windowless browser that supports javascript I've used for a few scrapping projects and it has worked well, sometimes a little slow with large operations. It also has support for proxies. http://htmlunit.sourceforge.net/

    0 讨论(0)
  • 2020-12-18 11:46

    Sounds like you want JSoup to behave like a browser with JavaScript support. That won't work, I'm afraid. JSoup is a tool that can execute a HTTP request, and then use the response body for something useful.

    This 'something useful' is to extract information from the (X)HTML text in the response. If you want to the contents of subsequent ajax-request following the loading of a JavaScript-infused HTML page (=dynamic web page) you'll need to model those follow-up requests yourself and instruct JSoup to execute those for you manually.

    0 讨论(0)
提交回复
热议问题