Python - ETFs Daily Data Web Scraping

徘徊边缘 提交于 2019-12-13 10:46:41

问题


I'm trying to web scrap some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info. The most relevant info is the open Price, outstanding shares, NAV, total assets of the ETF. Here is the link for IVV US Equity: https://www.marketwatch.com/investing/fund/ivv

I'm just starting to get Python experience, would like to recieve some tips and guidelines on how to start a web scraping program. I have been told BeutifulSoup is the package to use for web scraping.

I have web scraped with VBA before but the HTML of the pages I had used are different, I don't know if this is because some values of the ETFs (such as Price and Taded Volume) change constantly.

I am open to any suggestion or any other website that could be useful (I have tried with Yahoo Finance and Morningstar and I get the same problema with the HTML code).


回答1:


Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.marketwatch.com/investing/fund/ivv")
html = r.text

soup = BeautifulSoup(html, "html.parser")

if soup.h1.string == "Pardon Our Interruption...":
    print("They detected we are a bot. We hit a captcha.")
else:
    price = soup.find("h3", class_="intraday__price").find("bg-quote").string
    print(price)

The fact that the price changes frequently is not a problem. The names and classes of the HTML tags will remain constant. And this is all you need for Beautiful Soup to work.

Your main challenge is that the website is able to detect you are not using an Internet browser, and will display a captcha to your Python script. So you will need to find a method around this. Also, I recommend checking the legality of scraping and whether it violates their terms of service.

You can learn more about Beautiful Soup here:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/



来源:https://stackoverflow.com/questions/52978037/python-etfs-daily-data-web-scraping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!