VBA pull XML data from multiple Web locations

我与影子孤独终老i 提交于 2019-12-11 09:22:04

问题


In my previous question (everything needed is on this question; the link is here for the sake of completeness and measure) i ask for a way to pull XML data to Excel from a Web location. The code i received (courtesy of user2140261) as an answer lies here:

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

But every company has a different XML Instance Document, and every time period a company publishes a different XML Instance Document (e.g. quarterly, annually). So these documents can be accessed in different web locations.

Now in the previous procedure we can see we only need to use the statement

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

...but this is when we know beforehand that we want data from one specified location in the Web

What if we want to pull some data for these 4 different locations depicted by an asterisk(*) in the image below

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

The coordinates that we input can be:

  • A Stock Ticker (e.g. TSLA for Tesla Motors)
  • A type of files for example 10-Q's

You can pick the type of files in these links for BDX and ANN respectively:

BDX LINK

ANN LINK

Below we have 2 web locations for the Instance Document locations of BDX company and 2 for ANN company

  • BDX Company

    http://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml http://www.sec.gov/Archives/edgar/data/10795/000119312513318898/bdx-20130630.xml

  • ANN Company

    http://www.sec.gov/Archives/edgar/data/874214/000087421413000036/ann-20131102.xml http://www.sec.gov/Archives/edgar/data/874214/000087421413000027/ann-20130803.xml

How could we pull from an XML element that is existent in all the four instance documents for example us-gaap:CommonStockValue by simply giving VBA the

  1. Stock Ticker
  2. The document type (10-K, 10-Q)

Can it be done with the use of Microsoft XML Core Services (MSXML) or we require some other Library too?

You can see how impractical it is to fire this code thousand of times and every time copy the URL from the Web Browser to the strXMLSite as a String value....


回答1:


[edit1]

In response to the comment:

the only thing that remains for the us is to understand how URL's actually change so they can be predictable and manipulated by sting concantenation? In what code language is the URL written?

The short answer is open a browser and right-click on a blank spot in the webpage you're interested in and select View Source from the popup menu.

To repeat the example provided in the other post VBA href Crawl on Browser's Source Code , do this:

Open Edgar Online Company Search in a browser: https://www.sec.gov/edgar/searchedgar/companysearch.html

Use the Fast Search function to search for ticker CRR and it gives me this URL: https://www.sec.gov/cgi-bin/browse-edgar?CIK=CRR&Find=Search&owner=exclude&action=getcompany which contains the list of public filings for Carbo Ceramics, Inc.

Now, right click on the page to get the source and scroll down to line 91. You'll see this block of code:

      <table class="tableFile2" summary="Results">

That's the beginning of the results table that shows the list of public filings.

         <tr>
            <th width="7%" scope="col">Filings</th>
            <th width="10%" scope="col">Format</th>
            <th scope="col">Description</th>
            <th width="10%" scope="col">Filing Date</th>
            <th width="15%" scope="col">File/Film Number</th>
         </tr>

That's the header row of the table with column descriptions.

<tr>
<td nowrap="nowrap">SC 13G</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/1009672/000108975514000003/0001089755-14-000003-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Statement of acquisition of beneficial ownership by individuals<br />Acc-no: 0001089755-14-000003&nbsp;(34 Act)&nbsp; Size: 8 KB            </td>
            <td>2014-02-14</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=005-48851&amp;owner=exclude&amp;count=40">005-48851</a><br>14615563         </td>
         </tr>

And that's the first row of actual data in the table for filing SC 13G, Statement of acquisition of beneficial ownership by individuals Acc-no: 0001089755-14-000003 (34 Act) Size: 8 KB, submitted on 2014-02-14.

So, now you want to loop through all of the document URLs on this page and that's why you're asking what language the URLs are in? (Crawl the page, in other words?)

[begin original answer]

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

I googled "get google results as xml" while researching another question. One interesting hit that came back was this link: http://nielsbosma.se/projects/seotools/functions/

I make no representation about the merits of this tool, but it seems to have the functionality you're asking for.

Now in the previous procedure we can see we only need to use the statement strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml" ...but this is when we know beforehand that we want data from one specified location in the Web

Yes, so once you've gotten some sort of web crawling function to return a list of xml document links, you first need to put them somewhere the user can see. My preference would be a range on a worksheet, but you could load up a list or combo box in a form as well. Regardless, then you would modify Sub GetNode() to accept an input parameter based on user selection:

Sub GetNode(strUrl as String)
...
strXMLSite = strUrl
...
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

Or perhaps better make it a function which returns the xml as text for you to consume however you'd like:

Function GetNode(strUrl as String) as String
...
strXMLSite = strUrl
...
'return result
GetNode = objXMLNodeDIIRSP.Text
End Function  

Interesting question overall and I was happy to give you feedback on the code you posted. Your other questions can probably be answered by doing a bit of google searching.



来源:https://stackoverflow.com/questions/21786105/vba-pull-xml-data-from-multiple-web-locations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!