Reading xbrl with python

问题

I am trying find particular tag in an xbrl file. I originally tried using python-xbrl package, but it is not exactly what I want, so I based my code on the one available from the package.

Here's the part of xbrl that I am interested in

<us-gaap:LiabilitiesCurrent contextRef="eol_PE2035----1510-Q0008_STD_0_20150627_0" unitRef="iso4217_USD" decimals="-6" id="id_5025426_6FEF05CB-B19C-4D84-AAF1-79B431731049_1_24">65285000000</us-gaap:LiabilitiesCurrent>
<us-gaap:Liabilities contextRef="eol_PE2035----1510-Q0008_STD_0_20150627_0" unitRef="iso4217_USD" decimals="-6" id="id_5025426_6FEF05CB-B19C-4D84-AAF1-79B431731049_1_28">147474000000</us-gaap:Liabilities>

Here is the code

python-xbrl package is based on beautifulsoup4 and several other packages.

liabilities = xbrl.find_all(name=re.compile("(us-gaap:Liabilities)",
                          re.IGNORECASE | re.MULTILINE))

I get the value for us-gaap:LiabilitiesCurrent, but I want value for us-gaap:Liabilities. Right now as soon as it finds a match it, stores it. But in many cases its the wrong match due to the tag format in xbrl. I believe I need to change re.compile() part to make it work correctly.

回答1:

Try it with a $ dollar sign at the end to indicate not to match anything else following the dollar sign:

liabilities = xbrl.find_all(name=re.compile("(us-gaap:Liabilities$)",
                          re.IGNORECASE | re.MULTILINE))

回答2:

I'd be very wary about using this approach to parsing XBRL (or indeed, any XML with namespaces in it). "us-gaap:Liabilities" is a QName, consisting of a prefix ("us-gaap") and a local name ("Liabilities"). The prefix is just a shorthand for a full namespace URI such as "http://fasb.org/us-gaap/2015-01-31", which is defined by a namespace declaration, usually at the top of the document. If you look at the top of the document you'll see something like:

xmlns:us-gaap="http://fasb.org/us-gaap/2015-01-31"

This means that within the scope of this document, "us-gaap" is taken to mean that full namespace URI.

XML creators are free to use whatever prefixes they want, so there is no guarantee that the element will actually be called "us-gaap:Liabilities" across all documents that you encounter.

beautifulsoup4 has very limited support for namespaces, so I wouldn't recommend it as a starting point for building an XBRL processor. It may be worth taking a look at the Arelle project, which is a full XBRL processor, and will make it easier to do other tasks such as finding the labels and other information associated with facts in the taxonomy.

来源：https://stackoverflow.com/questions/33903843/reading-xbrl-with-python

标签

python

xbrl