问题
I'm trying to web scrap some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info. The most relevant info is the open Price, outstanding shares, NAV, total assets of the ETF. Here is the link for IVV US Equity: https://www.marketwatch.com/investing/fund/ivv
I'm just starting to get Python experience, would like to recieve some tips and guidelines on how to start a web scraping program. I have been told BeutifulSoup is the package to use for web scraping.
I have web scraped with VBA before but the HTML of the pages I had used are different, I don't know if this is because some values of the ETFs (such as Price and Taded Volume) change constantly.
I am open to any suggestion or any other website that could be useful (I have tried with Yahoo Finance and Morningstar and I get the same problema with the HTML code).
回答1:
Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.marketwatch.com/investing/fund/ivv")
html = r.text
soup = BeautifulSoup(html, "html.parser")
if soup.h1.string == "Pardon Our Interruption...":
print("They detected we are a bot. We hit a captcha.")
else:
price = soup.find("h3", class_="intraday__price").find("bg-quote").string
print(price)
The fact that the price changes frequently is not a problem. The names and classes of the HTML tags will remain constant. And this is all you need for Beautiful Soup to work.
Your main challenge is that the website is able to detect you are not using an Internet browser, and will display a captcha to your Python script. So you will need to find a method around this. Also, I recommend checking the legality of scraping and whether it violates their terms of service.
You can learn more about Beautiful Soup here:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
来源:https://stackoverflow.com/questions/52978037/python-etfs-daily-data-web-scraping