Extremely simple code not working in HtmlUnit

試著忘記壹切 提交于 2019-11-28 00:31:00

When I open this site in my browser it does not ever finish loading the page. This might be the problem why HtmlUnit crashes, too. Tested with Chrome and FF.

Try loading a more simple site instead and you may know if this crash is site-depended.

Well, although it is a horrible solution (workaround, actually...), I finally decided to disable the automatic loading of frames in HtmlUnit as adviced by one of the developers of HtmlUnit. This is what I did in detail:

  1. Downloaded the HtmlUnit source
  2. Downloaded maven from here
  3. Commented the content (the body of the method, not the declaration) of the loadFrames() method of the HtmlPage class located in htmlunit-2.9/src/main/java/com/gargoylesoftware/htmlunit/html
  4. Compiled this custom code skipping tests with: mvn -Dmaven.test.skip=true clean compile package
  5. Got the new htmlunit-2.9.jar located in htmlunit-2.9/artifacts and replaced the current htmlunit-2.9.jar library file
  6. This step might be the most delicate one as it will depend on each application. However, I'll show you the changes I needed to do to my application.

You know how my original code was (look at the question). That would download all frames and iframes from a page. I'm adding an example on how to get a page with frames just loading the frames you want:

try {
    HtmlPage page = webClient.getPage("http://www.w3schools.com/HTML/tryit.asp?filename=tryhtml_noframes");
    HtmlInlineFrame frame = page.getFirstByXPath("//iframe[@name='view']");
    page = webClient.getPage(page.getFullyQualifiedUrl(frame.getSrcAttribute()));
    System.out.println(page.asXml());
} catch (Exception e) {
    e.printStackTrace();
}

After this library change, the content of the frame will be empty once the getPage() method finishes. Notice it won't be null, looks like it is just returning an empty frame. What we need to do is to download the content of the frames we are interested in manually, that's why I'm performing a getPage() again.

Well this is how I managed to selectively download frames and iframes with HtmlUnit. Any ideas on how to improve this will be appreciated. Anyway, I hope there will be added some way to disable the loading of the frames in HtmlUnit itself in the future, maybe adding a method such as getPage(URL url, boolean downloadFrames) or something.

Hope this helps someone out there!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!