问题
In my previous question (everything needed is on this question; the link is here for the sake of completeness and measure) i ask for a way to pull XML data to Excel from a Web location. The code i received (courtesy of user2140261) as an answer lies here:
Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode
Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument
strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"
objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)
Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")
Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub
But every company has a different XML Instance Document, and every time period a company publishes a different XML Instance Document (e.g. quarterly, annually). So these documents can be accessed in different web locations.
Now in the previous procedure we can see we only need to use the statement
strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"
...but this is when we know beforehand that we want data from one specified location in the Web
What if we want to pull some data for these 4 different locations depicted by an asterisk(*) in the image below
How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?
The coordinates that we input can be:
- A Stock Ticker (e.g. TSLA for Tesla Motors)
- A type of files for example 10-Q's
You can pick the type of files in these links for BDX and ANN respectively:
BDX LINK
ANN LINK
Below we have 2 web locations for the Instance Document locations of BDX company and 2 for ANN company
BDX Company
http://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml http://www.sec.gov/Archives/edgar/data/10795/000119312513318898/bdx-20130630.xml
ANN Company
http://www.sec.gov/Archives/edgar/data/874214/000087421413000036/ann-20131102.xml http://www.sec.gov/Archives/edgar/data/874214/000087421413000027/ann-20130803.xml
How could we pull from an XML element that is existent in all the four instance documents for example us-gaap:CommonStockValue
by simply giving VBA the
- Stock Ticker
- The document type (10-K, 10-Q)
Can it be done with the use of Microsoft XML Core Services (MSXML) or we require some other Library too?
You can see how impractical it is to fire this code thousand of times and every time copy the URL from the Web Browser to the strXMLSite as a String value....
回答1:
[edit1]
In response to the comment:
the only thing that remains for the us is to understand how URL's actually change so they can be predictable and manipulated by sting concantenation? In what code language is the URL written?
The short answer is open a browser and right-click on a blank spot in the webpage you're interested in and select View Source
from the popup menu.
To repeat the example provided in the other post VBA href Crawl on Browser's Source Code , do this:
Open Edgar Online Company Search in a browser: https://www.sec.gov/edgar/searchedgar/companysearch.html
Use the Fast Search function to search for ticker CRR and it gives me this URL: https://www.sec.gov/cgi-bin/browse-edgar?CIK=CRR&Find=Search&owner=exclude&action=getcompany which contains the list of public filings for Carbo Ceramics, Inc.
Now, right click on the page to get the source and scroll down to line 91. You'll see this block of code:
<table class="tableFile2" summary="Results">
That's the beginning of the results table that shows the list of public filings.
<tr>
<th width="7%" scope="col">Filings</th>
<th width="10%" scope="col">Format</th>
<th scope="col">Description</th>
<th width="10%" scope="col">Filing Date</th>
<th width="15%" scope="col">File/Film Number</th>
</tr>
That's the header row of the table with column descriptions.
<tr>
<td nowrap="nowrap">SC 13G</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/1009672/000108975514000003/0001089755-14-000003-index.htm" id="documentsbutton"> Documents</a></td>
<td class="small" >Statement of acquisition of beneficial ownership by individuals<br />Acc-no: 0001089755-14-000003 (34 Act) Size: 8 KB </td>
<td>2014-02-14</td>
<td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&filenum=005-48851&owner=exclude&count=40">005-48851</a><br>14615563 </td>
</tr>
And that's the first row of actual data in the table for filing SC 13G
, Statement of acquisition of beneficial ownership by individuals Acc-no: 0001089755-14-000003 (34 Act) Size: 8 KB
, submitted on 2014-02-14
.
So, now you want to loop through all of the document URLs on this page and that's why you're asking what language the URLs are in? (Crawl the page, in other words?)
[begin original answer]
How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?
I googled "get google results as xml" while researching another question. One interesting hit that came back was this link: http://nielsbosma.se/projects/seotools/functions/
I make no representation about the merits of this tool, but it seems to have the functionality you're asking for.
Now in the previous procedure we can see we only need to use the statement strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml" ...but this is when we know beforehand that we want data from one specified location in the Web
Yes, so once you've gotten some sort of web crawling function to return a list of xml document links, you first need to put them somewhere the user can see. My preference would be a range on a worksheet, but you could load up a list or combo box in a form as well. Regardless, then you would modify Sub GetNode()
to accept an input parameter based on user selection:
Sub GetNode(strUrl as String)
...
strXMLSite = strUrl
...
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub
Or perhaps better make it a function which returns the xml as text for you to consume however you'd like:
Function GetNode(strUrl as String) as String
...
strXMLSite = strUrl
...
'return result
GetNode = objXMLNodeDIIRSP.Text
End Function
Interesting question overall and I was happy to give you feedback on the code you posted. Your other questions can probably be answered by doing a bit of google searching.
来源:https://stackoverflow.com/questions/21786105/vba-pull-xml-data-from-multiple-web-locations