I\'m trying to use Jsoup
to get stock data from a website called morningstar. I\'ve looked at other forums and haven\'t been able to find out what\'s wrong.
Since the content is created dynamically using javascript, you could use a headless browser like HtmlUnit https://sourceforge.net/projects/htmlunit/
The information regarding the price, etc. is embedded in an iFrame, so we first grab the (also dynamically build) iFrame link and parse the iFrame afterwards.
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(1000);
HtmlPage page = webClient.getPage("http://www.morningstar.com/stocks/xnas/aapl/quote.html");
Document doc = Jsoup.parse(page.asXml());
String title = doc.select(".r_title").select("h1").text();
String iFramePath = "http:" + doc.select("#quote_quicktake").select("iframe").attr("src");
page = webClient.getPage(iFramePath);
doc = Jsoup.parse(page.asXml());
System.out.println(title + " | Last Price [$]: " + doc.select("#last-price-value").text());
prints:
Apple Inc | Last Price [$]: 98.63
The javascript engine in HtmlUnit is rather slow (above code takes about 18 seconds on my machine), so it might be useful to look into other javascript engines/headless browsers (phantomJs, etc.; check this list of options: https://github.com/dhamaniasad/HeadlessBrowsers) to enhance the performance, but HtmlUnit gets the job done. You could also try to filter non relevant scripts, images, etc. with a custom WebConnectionWrapper
:
http://htmlunit.10904.n7.nabble.com/load-parse-speedup-tp22735p22738.html