Retrieve HTML content from an infinite scroll page (Facebook)

对着背影说爱祢 提交于 2021-02-10 17:31:49

问题


I would like tor retrieve HTML data from a dynamic web page, like for example a public Facebook page: https://www.facebook.com/bbcnews/ (public content, without login)

For example, in this page, we have an infinite scroll, and we have to go at the bottom of the page to load more posts.

My current code is here:

URL url = new URL("https://www.facebook.com/bbcnews/");

BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
BufferedWriter writer = new BufferedWriter(new FileWriter("path"));

while ((line = reader.readLine()) != null) {
    writer.write(line);
}

This code retrieve only the first part of the page.

How retrieve more content of the web page with the infinite scroll ?

Thanks.


回答1:


You won't get that through a simple BufferedReader looking at an HTTP stream. Open your browser console, then reach the end of the page. You'll see that an XHR call (asynchronous request) is fired toward this URL:

https://www.facebook.com/pages_reaction_units

With a lot of cryptic request parameters. You'll need to perform this kind of call in your java code. It's obfuscated for some reasons. Getting it done from scratch doesn't seems to be a good approach.

Better use an API provided by Facebook (maybe API Graph).



来源:https://stackoverflow.com/questions/52858241/retrieve-html-content-from-an-infinite-scroll-page-facebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!