问题
I am trying find particular tag in an xbrl file. I originally tried using python-xbrl package, but it is not exactly what I want, so I based my code on the one available from the package.
Here's the part of xbrl that I am interested in
<us-gaap:LiabilitiesCurrent contextRef="eol_PE2035----1510-Q0008_STD_0_20150627_0" unitRef="iso4217_USD" decimals="-6" id="id_5025426_6FEF05CB-B19C-4D84-AAF1-79B431731049_1_24">65285000000</us-gaap:LiabilitiesCurrent>
<us-gaap:Liabilities contextRef="eol_PE2035----1510-Q0008_STD_0_20150627_0" unitRef="iso4217_USD" decimals="-6" id="id_5025426_6FEF05CB-B19C-4D84-AAF1-79B431731049_1_28">147474000000</us-gaap:Liabilities>
Here is the code
python-xbrl
package is based on beautifulsoup4
and several other packages.
liabilities = xbrl.find_all(name=re.compile("(us-gaap:Liabilities)",
re.IGNORECASE | re.MULTILINE))
I get the value for us-gaap:LiabilitiesCurrent
, but I want value for us-gaap:Liabilities
.
Right now as soon as it finds a match it, stores it. But in many cases its the wrong match due to the tag format in xbrl. I believe I need to change re.compile()
part to make it work correctly.
回答1:
Try it with a $
dollar sign at the end to indicate not to match anything else following the dollar sign:
liabilities = xbrl.find_all(name=re.compile("(us-gaap:Liabilities$)",
re.IGNORECASE | re.MULTILINE))
回答2:
I'd be very wary about using this approach to parsing XBRL (or indeed, any XML with namespaces in it). "us-gaap:Liabilities" is a QName, consisting of a prefix ("us-gaap") and a local name ("Liabilities"). The prefix is just a shorthand for a full namespace URI such as "http://fasb.org/us-gaap/2015-01-31", which is defined by a namespace declaration, usually at the top of the document. If you look at the top of the document you'll see something like:
xmlns:us-gaap="http://fasb.org/us-gaap/2015-01-31"
This means that within the scope of this document, "us-gaap" is taken to mean that full namespace URI.
XML creators are free to use whatever prefixes they want, so there is no guarantee that the element will actually be called "us-gaap:Liabilities" across all documents that you encounter.
beautifulsoup4 has very limited support for namespaces, so I wouldn't recommend it as a starting point for building an XBRL processor. It may be worth taking a look at the Arelle project, which is a full XBRL processor, and will make it easier to do other tasks such as finding the labels and other information associated with facts in the taxonomy.
来源:https://stackoverflow.com/questions/33903843/reading-xbrl-with-python