问题
I've been trying to Scrape the entire HTML body and assign it as a string variable before manipulating that string to populate an excel file - this will be done on a a loop to update the date every 5 minute interval.
These pages are AJAX pages, so run what looks like JavaScript (I'm not familiar with JS at all though).
I've tried using the XMLHttpRequest
object (code below) but t returns the JS Calls:
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE", False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Debug.Print XMLHTTP.ResponseText
I've tried creating an IE
object with the below code but, again, same issue:
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE"
While IE.Busy Or IE.ReadyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.Document
Debug.Print = HTMLdoc.Body.innerHTML
What I want it exactly text available to me when I hit F12 and got to the inspector tab (ie. the entirety of the text within the yellow section below) - If I could get this (full expanded) I could work from there. Any help would be massively appreciated.
In the above example (Google finance), the index prices update asynchronously - I want to capture these at the time at which I assign the string.
回答1:
For any dynamically loaded data you just inspect XHRs the webpage does, find the one containing the relevant data, make the same XHR (either site provides API or not) and parse response, or in case of IE automation you add extra wait loop until a target element becomes accessible, then retrieve it from DOM.
In that certain case you can get the data via Google Finance API.
Method 1.
To make the request you have to know stock symbols, which could be easily find within webpage HTML content, or e. g. if you click on CAC 40, in opened page there will be a title CAC 40 (INDEXEURO:PX1).
There are the following stock and stock exchange symbols in the World markets table on that page:
Shanghai SHA:000001
S&P 500 INDEXSP:.INX
Nikkei 225 INDEXNIKKEI:NI225
Hang Seng Index INDEXHANGSENG:HSI
TSEC TPE:TAIEX
EURO STOXX 50 INDEXSTOXX:SX5E
CAC 40 INDEXEURO:PX1
S&P TSX INDEXTSI:OSPTX
S&P/ASX 200 INDEXASX:XJO
BSE Sensex INDEXBOM:SENSEX
SMI INDEXSWX:SMI
ATX INDEXVIE:ATX
IBOVESPA INDEXBVMF:IBOV
SET INDEXBKK:SET
BIST100 INDEXIST:XU100
IBEX INDEXBME:IB
WIG WSE:WIG
TASI TADAWUL:TASI
MERVAL BCBA:IAR
IPC INDEXBMV:ME
IDX Composite IDX:COMPOSITE
Put them into URL:
http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE
The response contains JSON data, like this:
[
{
"id": "7521596",
"t": "000001",
"e": "SHA",
"l": "3,222.51",
"l_fix": "3222.51",
"l_cur": "CN¥3,222.51",
"s": "0",
"ltt": "3:01PM GMT+8",
"lt": "Mar 31, 3:01PM GMT+8",
"lt_dts": "2017-03-31T15:01:15Z",
"c": "+12.28",
"c_fix": "12.28",
"cp": "0.38",
"cp_fix": "0.38",
"ccol": "chg",
"pcls_fix": "3210.2368"
},
...
]
You may use the below VBA code to parse response and output result. It requires JSON.bas module to be imported to VBA project for JSON processing.
Sub GoogleFinanceData()
Dim sJSONString As String
Dim vJSON As Variant
Dim sState As String
Dim aData()
Dim aHeader()
' Retrieve Google Finance data
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE", False
.Send
If .Status <> 200 Then Exit Sub
sJSONString = .responseText
End With
' Trim extraneous chars
sJSONString = Mid(sJSONString, InStr(sJSONString, "["))
' Parse JSON string
JSON.Parse sJSONString, vJSON, sState
If sState = "Error" Then Exit Sub
' Convert to table format
JSON.ToArray vJSON, aData, aHeader
' Results output
With Sheets(1)
.Cells.Delete
.Cells.WrapText = False
If UBound(aHeader) >= 0 Then OutputArray .Cells(1, 1), aHeader
Output2DArray .Cells(2, 1), aData
.Columns.AutoFit
End With
End Sub
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
As a result the data you need is located in l_fix
, c_fix
, cp_fix
columns.
Method 2.
Also you can make XHR by the URL like this one for CAC 40:
https://www.google.co.uk/finance/getprices?q=PX1&x=INDEXEURO&i=120&p=20m&f=d,c,v,o,h,l
Particularly that URL is for PX1 stock and INDEXEURO stock exchange symbols, 120 sec interval, 20 minutes period, response data d,c,v,o,h,l is for DATE (UNIX TimeStamp), CLOSE, VOLUME, OPEN, HIGH, LOW.
Response format is as follows:
EXCHANGE%3DINDEXEURO
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=1050
INTERVAL=120
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=120
a1491405000,5098.75,5099.92,5098.75,5099.92,0
1,5100.51,5100.51,5098.09,5098.09,0
2,5099.63,5101.2,5099.29,5100.68,0
3,5099.83,5100.04,5099.07,5099.28,0
4,5098.19,5098.9,5097.71,5098.9,0
5,5098.56,5099.24,5097.99,5099.24,0
6,5097.34,5098.2,5096.14,5098.2,0
7,5096.52,5097.38,5095.66,5097.38,0
8,5093.27,5095.39,5093.27,5095.39,0
9,5094.43,5094.43,5092.07,5093.17,0
10,5088.18,5092.72,5087.68,5092.72,0
The XHR should be done for each stock symbol in the list, then results should be consolidated into table.
来源:https://stackoverflow.com/questions/43183637/scraping-an-ajax-page-using-vba