I\'m trying to scrape data from income statements on Yahoo Finance using Python. Specifically, let\'s say I want the most recent figure of Net Income of Apple.
The d
This is made a little more difficult because the "Net Income" in enclosed in a <strong>
tag, so bear with me, but I think this works:
import re, requests
from bs4 import BeautifulSoup
url = 'https://finance.yahoo.com/q/is?s=AAPL&annual'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
pattern = re.compile('Net Income')
title = soup.find('strong', text=pattern)
row = title.parent.parent # yes, yes, I know it's not the prettiest
cells = row.find_all('td')[1:] #exclude the <td> with 'Net Income'
values = [ c.text.strip() for c in cells ]
values
, in this case, will contain the three table cells in that "Net Income" row (and, I might add, can easily be converted to ints - I just liked that they kept the ',' as strings)
In [10]: values
Out[10]: [u'53,394,000', u'39,510,000', u'37,037,000']
When I tested it on Alphabet (GOOG) - it doesn't work because they don't display an Income Statement I believe (https://finance.yahoo.com/q/is?s=GOOG&annual) but when I checked Facebook (FB), the values were returned correctly (https://finance.yahoo.com/q/is?s=FB&annual).
If you wanted to create a more dynamic script, you could use string formatting to format the url with whatever stock symbol you want, like this:
ticker_symbol = 'AAPL' # or 'FB' or any other ticker symbol
url = 'https://finance.yahoo.com/q/is?s={}&annual'.format(ticker_symbol))