Scrape Yahoo Finance Income Statement with Python

前端 未结 1 1011
没有蜡笔的小新
没有蜡笔的小新 2020-12-16 06:18

I\'m trying to scrape data from income statements on Yahoo Finance using Python. Specifically, let\'s say I want the most recent figure of Net Income of Apple.

The d

相关标签:
1条回答
  • 2020-12-16 06:53

    This is made a little more difficult because the "Net Income" in enclosed in a <strong> tag, so bear with me, but I think this works:

    import re, requests
    from bs4 import BeautifulSoup
    
    url = 'https://finance.yahoo.com/q/is?s=AAPL&annual'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    pattern = re.compile('Net Income')
    
    title = soup.find('strong', text=pattern)
    row = title.parent.parent # yes, yes, I know it's not the prettiest
    cells = row.find_all('td')[1:] #exclude the <td> with 'Net Income'
    
    values = [ c.text.strip() for c in cells ]
    

    values, in this case, will contain the three table cells in that "Net Income" row (and, I might add, can easily be converted to ints - I just liked that they kept the ',' as strings)

    In [10]: values
    Out[10]: [u'53,394,000', u'39,510,000', u'37,037,000']
    

    When I tested it on Alphabet (GOOG) - it doesn't work because they don't display an Income Statement I believe (https://finance.yahoo.com/q/is?s=GOOG&annual) but when I checked Facebook (FB), the values were returned correctly (https://finance.yahoo.com/q/is?s=FB&annual).

    If you wanted to create a more dynamic script, you could use string formatting to format the url with whatever stock symbol you want, like this:

    ticker_symbol = 'AAPL' # or 'FB' or any other ticker symbol
    url = 'https://finance.yahoo.com/q/is?s={}&annual'.format(ticker_symbol))
    
    0 讨论(0)
提交回复
热议问题