问题
Is there a way to convert an XML file (financial statements from IB API) to Pandas without knowing the exact column headers? The rows should reflect the different dates (there are 4 or more datapoints per column). Would also be great to get balance sheet, income statement and cash flow statement separately. I have tried to use beautiful soup but am getting frustrated because it seems like I need to look for each column header specifically and I don't know how to get the data for each date.
Im trying to get three separate dataframes (one for each financial statement). sorry I don't know how to add a table here but they should look something like this.
Df1 name = income statement
|---------------------|------------------|------------------|
| Date | SREV | VDES |
|---------------------|------------------|------------------|
| 2018-09-29 | 265595.000000 | 12.208930 |
|---------------------|------------------|------------------|
| ..... | ...... | ..... |
|---------------------|------------------|------------------|
the example is from this segment:
<Statement Type="INC">
<FPHeader>
<PeriodLength>52</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<AccountingStd/>
<StatementDate>2018-09-29</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2018-11-05">10-K</Source>
</FPHeader>
<lineItem coaCode="SREV">265595.000000</lineItem>
(...)
<lineItem coaCode="VDES">12.208930</lineItem>
This is the XML file (about half due to character limit):
<?xml version="1.0" encoding="utf-8"?>
<ReportFinancialStatements Major="1" Minor="0" Revision="1">
<CoIDs>
<CoID Type="RepNo">05680</CoID>
<CoID Type="CompanyName">Apple Inc.</CoID>
<CoID Type="IRSNo">942404110</CoID>
<CoID Type="CIKNo">0000320193</CoID>
</CoIDs>
<Issues>
<Issue Desc="Common Stock" ID="1" Order="1" Type="C">
<IssueID Type="Name">Ordinary Shares</IssueID>
<IssueID Type="Ticker">AAPL</IssueID>
<IssueID Type="RIC">AAPL.O</IssueID>
<IssueID Type="DisplayRIC">AAPL.OQ</IssueID>
<IssueID Type="InstrumentPI">331724</IssueID>
<IssueID Type="QuotePI">7645713</IssueID>
<Exchange Code="NASD" Country="USA">NASDAQ</Exchange>
<MostRecentSplit Date="2014-06-09">7.0</MostRecentSplit>
</Issue>
</Issues>
<CoGeneralInfo>
<CoStatus Code="1">Active</CoStatus>
<CoType Code="EQU">Equity Issue</CoType>
<LastModified>2020-01-23</LastModified>
<LatestAvailableAnnual>2019-09-28</LatestAvailableAnnual>
<LatestAvailableInterim>2019-09-28</LatestAvailableInterim>
<ReportingCurrency Code="USD">U.S. Dollars</ReportingCurrency>
<MostRecentExchange Date="2020-01-22">1.0</MostRecentExchange>
</CoGeneralInfo>
<StatementInfo>
<COAType Code="IND">Industry</COAType>
<BalanceSheetDisplay Code="CUR">Differentiates</BalanceSheetDisplay>
<CashFlowMethod Code="IND">Indirect</CashFlowMethod>
</StatementInfo>
<Notes>
<CFAAvailability Code="1"/>
<IAvailability Code="1"/>
<ISIAvailability Code="1"/>
<BSIAvailability Code="1"/>
<CFIAvailability Code="1"/>
</Notes>
<FinancialStatements>
<COAMap>
<mapItem coaItem="SREV" lineID="100" precision="1" statementType="INC">Revenue</mapItem>
(...)
<mapItem coaItem="SCTP" lineID="1050" precision="1" statementType="CAS">Cash Taxes Paid</mapItem>
</COAMap>
<AnnualPeriods>
<FiscalPeriod EndDate="2019-09-28" FiscalYear="2019" Type="Annual">
<Statement Type="INC">
<FPHeader>
<PeriodLength>52</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<AccountingStd/>
<StatementDate>2019-09-28</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2019-10-31">10-K</Source>
</FPHeader>
<lineItem coaCode="SREV">260174.000000</lineItem>
(...)
<lineItem coaCode="VDES">11.885790</lineItem>
</Statement>
<Statement Type="BAL">
<FPHeader>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<StatementDate>2019-09-28</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2019-10-31">10-K</Source>
</FPHeader>
<lineItem coaCode="ACSH">12204.000000</lineItem>
(...)
<lineItem coaCode="STBP">20.365340</lineItem>
</Statement>
<Statement Type="CAS">
<FPHeader>
<PeriodLength>52</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<StatementDate>2019-09-28</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2019-10-31">10-K</Source>
</FPHeader>
<lineItem coaCode="ONET">55256.000000</lineItem>
(...)
<lineItem coaCode="SNCC">24311.000000</lineItem>
</Statement>
</FiscalPeriod>
<FiscalPeriod EndDate="2018-09-29" FiscalYear="2018" Type="Annual">
<Statement Type="INC">
<FPHeader>
<PeriodLength>52</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<AccountingStd/>
<StatementDate>2018-09-29</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2018-11-05">10-K</Source>
</FPHeader>
<lineItem coaCode="SREV">265595.000000</lineItem>
(...)
<lineItem coaCode="VDES">12.208930</lineItem>
</Statement>
<Statement Type="BAL">
<FPHeader>
<UpdateType Code="CLA">Reclassified Normal</UpdateType>
<StatementDate>2018-12-29</StatementDate>
<Source Date="2019-01-30">10-Q</Source>
</FPHeader>
<lineItem coaCode="ACSH">11575.000000</lineItem>
(...)
<lineItem coaCode="STBP">22.533610</lineItem>
</Statement>
<Statement Type="CAS">
<FPHeader>
<PeriodLength>52</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<StatementDate>2018-09-29</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2018-11-05">10-K</Source>
</FPHeader>
<lineItem coaCode="ONET">59531.000000</lineItem>
(...)
<lineItem coaCode="SNCC">5624.000000</lineItem>
</Statement>
</FiscalPeriod>
<FiscalPeriod EndDate="2017-09-30" FiscalYear="2017" Type="Annual">
<Statement Type="INC">
<FPHeader>
<PeriodLength>53</PeriodLength>
<periodType Code="W">Weeks</periodType>
<UpdateType Code="UPD">Updated Normal</UpdateType>
<AccountingStd/>
<StatementDate>2017-09-30</StatementDate>
<AuditorName Code="EY">Ernst & Young LLP</AuditorName>
<AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
<Source Date="2017-11-03">10-K</Source>
</FPHeader>
<lineItem coaCode="SREV">229234.000000</lineItem>
(...)
<lineItem coaCode="VDES">9.206750</lineItem>
</Statement>
<Statement Type="BAL">
回答1:
I know its not pretty but this works:
from ib_insync import *
from bs4 import BeautifulSoup as bs
import pandas as pd
ib = IB()
ib.connect('127.0.0.1', 7497, clientId=1)
security = Stock('AAPL', 'SMART', 'USD')
# request the fundamentals
fundamentals = ib.reqFundamentalData(security, reportType='ReportsFinStatements')
soup = bs(fundamentals,'xml')
bal_l = []
inc_l = []
cas_l = []
for period in soup.find_all('FiscalPeriod'):
if period.get('Type') != "Annual":
for statement in period.find_all('Statement'):
if statement.find('UpdateType').get('Code') != 'CLA':
dic = {}
t = statement.get('Type')
d = statement.find('Source').get('Date')
d1 = statement.find('StatementDate').text
dic['date'] = d
dic['StatementDate'] = d1
for item in statement.find_all('lineItem'):
dic[item.get('coaCode')] =item.text
if t == 'BAL':
bal_l.append(dic)
print(t, d, dic)
elif t == 'INC':
inc_l.append(dic)
elif t == 'CAS':
cas_l.append(dic)
balancesheet = pd.DataFrame(bal_l).sort_values('date')
with pd.option_context('display.max_rows', 1000, 'display.max_columns', None):
print(balancesheet)
来源:https://stackoverflow.com/questions/59911686/converting-xml-to-pandas