Scrape website with XML HTTP request with Excel VBA: wait for the page to fully load

后端 未结 1 913
深忆病人
深忆病人 2021-01-24 16:29

I\'m trying to scrape a product price from a webpage using Excel VBA. The following code is working when using VBA Internet Explorer navigate request. However I would like to us

相关标签:
1条回答
  • 2021-01-24 17:17

    REST API HTTP Request:

    Your current method does not allow for the page to load fully as you have noted. You can formulate a REST API XMLHTTPrequest, using URLEncode to pass an encoded URL string to the API. The server sends back a JSON response containing the value you are after and lots of other info as well.

    I demonstrate two methods of extracting the price info from the returned JSON string: ① Using the Split function to extract the price by generating substrings until the required string is left; ② Using a JSONParser to navigate the JSON structure and return the required value.

    Code:

    The following uses Split to extract the value.

    Option Explicit
    Public Sub GetPrice()
        Const BASE_URL As String = "https://www.ah.nl/service/rest/delegate?url="
        Dim URL As String, sResponse As String, price As String
        URL = BASE_URL & Application.WorksheetFunction.EncodeURL("/producten/product/wi3640/lu-bastogne-biscuits-original")
    
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", URL, False
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
        End With
        price = Split(Split(sResponse, """now"":")(1), "}")(0)
        Debug.Print price
    End Sub
    

    Parsing the JSON response:

    Using Split:

    You could read the whole JSON response into a JSON object using a JSON parser, for example JSONConverter.bas. Then parse that object for price. I found it simpler to use Split function to extract the required info shown below:

    Split returns a zero-based, one-dimensional array containing a specified number of substrings based on splitting the input string on a specified delimiter.

    In the line,

    price = Split(Split(sResponse, """now"":")(1), "}")(0)
    

    I have two nested Split statements. These consecutively split the response JSON string to extract the price 1.55.

    The first split is using "now": as the delimiter resulting in an array as follows:

    The target price you can see is in the string at position 1.

    So, that string is extracted with:

    Split(sResponse, """now"":")(1)
    

    We then need to get just the price so use Split again to grab the 1.55 by using the delimiter "}":

    Split(Split(sResponse, """now"":")(1), "}")
    

    This results in the following array (shortened as quite long):

    The price we want is now at position 0 in the new array which is why we can use the following to extract the response.

    price = Split(Split(sResponse, """now"":")(1), "}")(0)
    

    Using JSON parser:

    If you want to traverse the json structure you would use the following:

    Dim json As Object
    Set json = JsonConverter.ParseJson(sResponse)("_embedded")("lanes")(5)("_embedded")("items")(1)("_embedded")("product")("priceLabel")
    Debug.Print json("now")
    

    After downloading and adding the JSONConverter.bas, you then add a reference to Microsoft Scripting Runtime via VBE > Tools > References. The above Set json code statement represents the path to the price, as seen in the JSON structure below. I have collapsed some detail to make the path clearer. You would insert the above couple of lines, into the original code, in place of the Split line.

    In the diagram above [] denotes a collection object which needs to be accessed via index, e.g. JsonConverter.ParseJson(sResponse)("_embedded")("lanes")(5). The {} denotes a dictionary object which can be accessed by key e.g. JsonConverter.ParseJson(sResponse)("_embedded")("lanes")(5)("_embedded"). The syntax in my line,

    Set json = JsonConverter.ParseJson(sResponse)("_embedded")("lanes")(5)("_embedded")("items")(1)("_embedded")("product")("priceLabel")
    

    demonstrates the different syntax to navigate these two object types.

    0 讨论(0)
提交回复
热议问题