crawl dynamic web page using htmlunit

丶灬走出姿态 提交于 2019-12-18 03:36:06

问题


I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event:

webclient.setJavaScriptEnabled(true);
webclient.setAjaxController(new NicelyResynchronizingAjaxController());
ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)");
webclient.waitForBackgroundJavaScript(10000);
myHtmlPage=(HtmlPage)sr.getNewPage();

But it seems myHtmlPage stays the same with the previous one, i.e., new data is not appended in myHtmlPage, as a result I can only crawl the first few data on the web page. Thanks for your help!


回答1:


I was searching the same thing. I was only able to find that it is not scroll event (90% sure). There is link on JS wich is responsilbe for loading the page and could maybe help you.




回答2:


I had similiar problem where the content were post-loaded during page scrolling. I solved it using:

webClient.getCurrentWindow().setInnerHeight(Integer.MAX_VALUE);



来源:https://stackoverflow.com/questions/12119610/crawl-dynamic-web-page-using-htmlunit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!