I have come to the point of entering the melee on web-scraping webpages using Javascript, with Python3. I am well aware that my boot may be making contact with a dead horse, but
When a page loads data via javascript, it has to make requests to the server to get that data via the XMLHttpRequest function (XHR). You can see what requests they are making, and then make them yourself, using wget!
To find out which requests they are making, use the Web Inspector (Chrome and Safari) or Firebug (Firefox). Here's how to do it in Chrome:
wrench/tools/developer tools/Network (tab at the top of the tools)/XHR filter at the bottom.
Here's an example request they make in javascript
If you look closely at the XHR request url, you notice that all trailing returns have the same format:
http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=
You just need to specify t
. For example:
http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=VAW http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=INTC http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=VHCOX
Now you can wget
those URIs and parse out the data directly.