I am trying to extract financial statement information based on type of the statement.
Let me explain to you in a little more details.
I want to extract the inco
Fortunately, it is not that difficult to extract financial statements. Here is how I was able to extract income statement info:
Replace the file="" parameter with your own path. You can also substitute url for file parameter
As far as I recall, the right place to look at is the user-friendly labels associated with these roles.
The SEC places restrictions on how these labels look like (e.g., paragraph 6.7.12 of the Edgar Filing Manual), e.g. 02 - Statement - Balance Sheet
. The income statement, cash flow statement and balance sheet are commonly found in labels with Statement
(as opposed to Disclosure
, Document
, Schedule
) between the two dashes.
The third part of the label itself will tell you where to find the income statement/cash flow statement/balance sheet, however the exact labels may vary between filers. Also, there are several kinds of these (consolidated vs. unconsolidated, classified vs. unclassified, etc), and the complexity is further increased because sometimes, the same filing may contain several versions (consolidated and unconsolidated), so that you need some domain expertise to decide which one you need.
In a nutshell, you will need to do some trial and error on real filings in order to find the right algorithm to filter these labels.
What should help you though, is that Charles Hoffman has done some research on this, which for example can be found here (section 1.5).