Scraping an AJAX page using VBA

北战南征 提交于 2021-02-06 13:59:49

问题


I've been trying to Scrape the entire HTML body and assign it as a string variable before manipulating that string to populate an excel file - this will be done on a a loop to update the date every 5 minute interval.

These pages are AJAX pages, so run what looks like JavaScript (I'm not familiar with JS at all though).

I've tried using the XMLHttpRequest object (code below) but t returns the JS Calls:

Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE", False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Debug.Print XMLHTTP.ResponseText

I've tried creating an IE object with the below code but, again, same issue:

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE"
While IE.Busy Or IE.ReadyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.Document
Debug.Print = HTMLdoc.Body.innerHTML

What I want it exactly text available to me when I hit F12 and got to the inspector tab (ie. the entirety of the text within the yellow section below) - If I could get this (full expanded) I could work from there. Any help would be massively appreciated.

In the above example (Google finance), the index prices update asynchronously - I want to capture these at the time at which I assign the string.


回答1:


For any dynamically loaded data you just inspect XHRs the webpage does, find the one containing the relevant data, make the same XHR (either site provides API or not) and parse response, or in case of IE automation you add extra wait loop until a target element becomes accessible, then retrieve it from DOM.

In that certain case you can get the data via Google Finance API.

Method 1.

To make the request you have to know stock symbols, which could be easily find within webpage HTML content, or e. g. if you click on CAC 40, in opened page there will be a title CAC 40 (INDEXEURO:PX1).

There are the following stock and stock exchange symbols in the World markets table on that page:

Shanghai            SHA:000001
S&P 500             INDEXSP:.INX
Nikkei 225          INDEXNIKKEI:NI225
Hang Seng Index     INDEXHANGSENG:HSI
TSEC                TPE:TAIEX
EURO STOXX 50       INDEXSTOXX:SX5E
CAC 40              INDEXEURO:PX1
S&P TSX             INDEXTSI:OSPTX
S&P/ASX 200         INDEXASX:XJO
BSE Sensex          INDEXBOM:SENSEX
SMI                 INDEXSWX:SMI
ATX                 INDEXVIE:ATX
IBOVESPA            INDEXBVMF:IBOV
SET                 INDEXBKK:SET
BIST100             INDEXIST:XU100
IBEX                INDEXBME:IB
WIG                 WSE:WIG
TASI                TADAWUL:TASI
MERVAL              BCBA:IAR
IPC                 INDEXBMV:ME
IDX Composite       IDX:COMPOSITE

Put them into URL:

http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE

The response contains JSON data, like this:

[
    {
        "id": "7521596",
        "t": "000001",
        "e": "SHA",
        "l": "3,222.51",
        "l_fix": "3222.51",
        "l_cur": "CN¥3,222.51",
        "s": "0",
        "ltt": "3:01PM GMT+8",
        "lt": "Mar 31, 3:01PM GMT+8",
        "lt_dts": "2017-03-31T15:01:15Z",
        "c": "+12.28",
        "c_fix": "12.28",
        "cp": "0.38",
        "cp_fix": "0.38",
        "ccol": "chg",
        "pcls_fix": "3210.2368"
    },
    ...
]

You may use the below VBA code to parse response and output result. It requires JSON.bas module to be imported to VBA project for JSON processing.

Sub GoogleFinanceData()

    Dim sJSONString As String
    Dim vJSON As Variant
    Dim sState As String
    Dim aData()
    Dim aHeader()

    ' Retrieve Google Finance data
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE", False
        .Send
        If .Status <> 200 Then Exit Sub
        sJSONString = .responseText
    End With
    ' Trim extraneous chars
    sJSONString = Mid(sJSONString, InStr(sJSONString, "["))
    ' Parse JSON string
    JSON.Parse sJSONString, vJSON, sState
    If sState = "Error" Then Exit Sub
    ' Convert to table format
    JSON.ToArray vJSON, aData, aHeader
    ' Results output
    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
        If UBound(aHeader) >= 0 Then OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
    End With

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

As a result the data you need is located in l_fix, c_fix, cp_fix columns.

Method 2.

Also you can make XHR by the URL like this one for CAC 40:

https://www.google.co.uk/finance/getprices?q=PX1&x=INDEXEURO&i=120&p=20m&f=d,c,v,o,h,l

Particularly that URL is for PX1 stock and INDEXEURO stock exchange symbols, 120 sec interval, 20 minutes period, response data d,c,v,o,h,l is for DATE (UNIX TimeStamp), CLOSE, VOLUME, OPEN, HIGH, LOW.

Response format is as follows:

EXCHANGE%3DINDEXEURO
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=1050
INTERVAL=120
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=120
a1491405000,5098.75,5099.92,5098.75,5099.92,0
1,5100.51,5100.51,5098.09,5098.09,0
2,5099.63,5101.2,5099.29,5100.68,0
3,5099.83,5100.04,5099.07,5099.28,0
4,5098.19,5098.9,5097.71,5098.9,0
5,5098.56,5099.24,5097.99,5099.24,0
6,5097.34,5098.2,5096.14,5098.2,0
7,5096.52,5097.38,5095.66,5097.38,0
8,5093.27,5095.39,5093.27,5095.39,0
9,5094.43,5094.43,5092.07,5093.17,0
10,5088.18,5092.72,5087.68,5092.72,0

The XHR should be done for each stock symbol in the list, then results should be consolidated into table.



来源:https://stackoverflow.com/questions/43183637/scraping-an-ajax-page-using-vba

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!