Python 3, Web-scraping, and Javascript [Oh My]

后端未结

关注

 1  420

误落风尘 2021-02-04 21:31

I have come to the point of entering the melee on web-scraping webpages using Javascript, with Python3. I am well aware that my boot may be making contact with a dead horse, but

1条回答

既然无缘 (楼主)

2021-02-04 21:48

When a page loads data via javascript, it has to make requests to the server to get that data via the XMLHttpRequest function (XHR). You can see what requests they are making, and then make them yourself, using wget!

To find out which requests they are making, use the Web Inspector (Chrome and Safari) or Firebug (Firefox). Here's how to do it in Chrome:

wrench/tools/developer tools/Network (tab at the top of the tools)/XHR filter at the bottom.

Here's an example request they make in javascript

If you look closely at the XHR request url, you notice that all trailing returns have the same format:

http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=

You just need to specify t. For example:

http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=VAW http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=INTC http://performance.morningstar.com/Performance/cef/trailing-total-returns.action?t=VHCOX

Now you can wget those URIs and parse out the data directly.

0 讨论(0)
发布评论:

提交评论
- 加载中...