问题
I have managed to write some Python code and Selenium that navigates to a webpage that contains financial data that is in some tables.
I want to be able to extract the data and put it into excel.
The tables seem to be html based tables code below:
<tr>
<td class="bc2T bc2gt">Last update</td>
<td class="bc2V bc2D">03/15/2018</td><td class="bc2V bc2D">03/14/2019</td><td class="bc2V bc2D">03/12/2020</td><td class="bc2V bc2D" style="background-color:#DEFEFE;">05/22/2020</td><td class="bc2V bc2D" style="background-color:#DEFEFE;">05/20/2020</td><td class="bc2V bc2D" style="background-color:#DEFEFE;">05/18/2020</td>
</tr>
</table>
The table has the following class name:
<table class='BordCollapseYear2' style="margin-right:20px; font-size:12px; width:100%;" cellspacing=0>
Is there a way I can extract this data? Ideally I want this to be dynamic so that it can extract information for different companies.
I've never used it before, but I've seen BeautifulSoup library mentioned a few times.
https://www.marketscreener.com/MICROSOFT-CORPORATION-4835/financials/
As an example Microsoft. I'd want to extract the income statement data, balance sheet etc.
回答1:
This script will scrape all tables found on the page and pretty prints them:
import requests
from bs4 import BeautifulSoup
url = 'https://www.marketscreener.com/MICROSOFT-CORPORATION-4835/financials/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = {}
# for every table found on page...
for table in soup.select('table.BordCollapseYear2'):
table_name = table.find_previous('b').text
all_data[table_name] = []
# ..scrap every row
for tr in table.select('tr'):
row = [td.get_text(strip=True, separator=' ') for td in tr.select('td')]
if len(row) == 7:
all_data[table_name].append(row)
#pretty print all data:
for k, v in all_data.items():
print('Table name: {}'.format(k))
print('-' * 160)
for row in v:
print(('{:<25}'*7).format(*row))
print()
Prints:
Table name: Valuation
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Fiscal Period: June 2017 2018 2019 2020 2021 2022
Capitalization 1 532 175 757 640 1 026 511 1 391 637 - -
Entreprise Value (EV) 1 485 388 700 112 964 870 1 315 823 1 299 246 1 276 659
P/E ratio 25,4x 46,3x 26,5x 32,3x 29,7x 25,8x
Yield 2,26% 1,70% 1,37% 1,10% 1,18% 1,31%
Capitalization / Revenue 5,51x 6,87x 8,16x 9,81x 8,89x 7,95x
EV / Revenue 5,02x 6,34x 7,67x 9,28x 8,30x 7,30x
EV / EBITDA 12,7x 15,4x 17,7x 20,2x 18,3x 15,9x
Cours sur Actif net 7,46x 9,15x 10,0x 12,1x 10,1x 8,49x
Nbr of stocks (in thousands)7 720 510 7 683 198 7 662 818 7 583 440 - -
Reference price (USD) 68,9 98,6 134 184 184 184
Last update 07/20/2017 07/19/2018 07/18/2019 05/08/2020 04/30/2020 04/30/2020
Table name: Annual Income Statement Data
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Fiscal Period: June 2017 2018 2019 2020 2021 2022
Net sales 1 96 657 110 360 125 843 141 818 156 534 174 945
EBITDA 1 38 117 45 319 54 641 65 074 70 966 80 445
Operating profit (EBIT) 129 339 35 058 42 959 52 544 57 045 65 289
Operating Margin 30,4% 31,8% 34,1% 37,1% 36,4% 37,3%
Pre-Tax Profit (EBT) 1 23 149 36 474 43 688 52 521 57 042 65 225
Net income 1 21 204 16 571 39 240 43 693 47 223 53 905
Net margin 21,9% 15,0% 31,2% 30,8% 30,2% 30,8%
EPS 2 2,71 2,13 5,06 5,68 6,18 7,11
Dividend per Share 2 1,56 1,68 1,84 2,02 2,16 2,41
Last update 07/20/2017 07/19/2018 07/18/2019 05/22/2020 05/22/2020 05/22/2020
Table name: Balance Sheet Analysis
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Fiscal Period: June 2017 2018 2019 2020 2021 2022
Net Debt 1 - - - - - -
Net Cash position 1 46 787 57 528 61 641 75 814 92 392 114 978
Leverage (Debt / EBITDA) -1,23x -1,27x -1,13x -1,17x -1,30x -1,43x
Free Cash Flow 1 31 378 32 252 38 260 41 953 46 887 53 155
ROE (Net Profit / Equities)29,4% 19,4% 42,4% 36,6% 34,5% 36,1%
Shareholders' equity 1 72 195 85 215 92 524 119 417 136 690 149 484
ROA (Net Profit / Asset) 9,76% 6,51% 14,4% 18,5% 14,6% 14,7%
Assets 1 217 276 254 580 272 703 235 800 323 445 366 702
Book Value Per Share 2 9,24 10,8 13,4 15,2 18,2 21,6
Cash Flow per Share 2 5,04 5,63 6,73 7,03 8,02 9,79
Capex 1 8 129 11 632 13 925 15 698 17 922 19 507
Capex / Sales 8,41% 10,5% 11,1% 11,1% 11,4% 11,2%
Last update 07/20/2017 07/19/2018 07/18/2019 05/22/2020 05/22/2020 05/04/2020
EDIT (to save all_data
as csv file):
import csv
with open('data.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for k, v in all_data.items():
spamwriter.writerow([k])
for row in v:
spamwriter.writerow(row)
Screenshot from LibreOffice:
来源:https://stackoverflow.com/questions/61974854/get-financial-data-using-python