Alternative to HtmlUnit

后端 未结 6 732
感情败类
感情败类 2021-02-01 06:14

I have been researching about the headless browsers available till to date and found HtmlUnit being used pretty extensively. Do we have any alternative to HtmlUnit with possible

相关标签:
6条回答
  • 2021-02-01 06:22

    There are many other libraries that you can use for this.

    • If you need to scrape xml base data use JTidy.
    • If you need to scrape specific data from HTML you can use Jsoup.

    Well I use jsoup - it's pretty much faster than any other API.

    0 讨论(0)
  • 2021-02-01 06:28

    As far as I know, HtmlUnit` is the most powerful headless browser.

    What are you issues with it?

    0 讨论(0)
  • 2021-02-01 06:33

    I would also recommend Selenium. The great feature is you can create a client that opens a browser page that you can see what's happening at each step. Moreover, creating macros for automated tests is another good feature. However, if you need to scrap some information from web page HtmlUnit is better than selenium.

    0 讨论(0)
  • 2021-02-01 06:35

    WebDriver with a virtual framebuffer is the only real alternative. The advantage is that it uses a real browser; the disadvantage is that it's more of a pain to set up, and the API is much poorer.

    0 讨论(0)
  • 2021-02-01 06:35

    I use webkit as a headless browser, through Qt's Python bindings: http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/qtwebkit.html

    Webkit is the render engine used by Chrome and Safari, and is very flexible.

    One of my reasons for choosing it over HtmlUnit was ease of setting up:

    sudo apt-get install python-qt4
    
    0 讨论(0)
  • 2021-02-01 06:40

    I am going to use Selenium for my use case, since it offers me to use the real browser and no deviation from what it would render in real world as compared to HtmlUnit. I am planning to use Selenium2 which has WebDriver integration and offers great API and cool fixes. Thanks Nayn

    0 讨论(0)
提交回复
热议问题