问题
I would like tor retrieve HTML data from a dynamic web page, like for example a public Facebook page: https://www.facebook.com/bbcnews/ (public content, without login)
For example, in this page, we have an infinite scroll, and we have to go at the bottom of the page to load more posts.
My current code is here:
URL url = new URL("https://www.facebook.com/bbcnews/");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
BufferedWriter writer = new BufferedWriter(new FileWriter("path"));
while ((line = reader.readLine()) != null) {
writer.write(line);
}
This code retrieve only the first part of the page.
How retrieve more content of the web page with the infinite scroll ?
Thanks.
回答1:
You won't get that through a simple BufferedReader
looking at an HTTP stream. Open your browser console, then reach the end of the page. You'll see that an XHR call (asynchronous request) is fired toward this URL:
https://www.facebook.com/pages_reaction_units
With a lot of cryptic request parameters. You'll need to perform this kind of call in your java code. It's obfuscated for some reasons. Getting it done from scratch doesn't seems to be a good approach.
Better use an API provided by Facebook (maybe API Graph).
来源:https://stackoverflow.com/questions/52858241/retrieve-html-content-from-an-infinite-scroll-page-facebook