Extracting a series of URL using VBA

后端 未结 1 1924
梦如初夏
梦如初夏 2021-01-27 10:11

I just trying to run through a list of url link, but it keep showing run time error\'91\',object variable or with block variable not set.

The data I want to extract is f

相关标签:
1条回答
  • 2021-01-27 10:25

    tl;dr

    Your error is due to the fact there are different numbers of elements for the given class name depending on the results per page. So you can't used fixed indexes. For the page you indicated the last index for that class, via the iframe, is 9 i.e. ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText . 10 and 11 are invalid. Below I show a way to determine the number of results and extract info from each result row.

    General principle:

    Ok... so the following works on the principle of targeting the Details of Changes table for most of the info.

    Example extract:

    More specifically, I target the rows that repeat the info for No, Date of Change, #Securities, Type of Transaction and Nature of Interest. These values are stored in an array of arrays (one array per row of information). Then the results arrays are stored in a collection to later be written out to the sheet. I loop each table cell in the targeted rows (td tag elements within parent tr) to populate the arrays.

    I add in the Name from the table above on the page and also, because there can be more than one row of results, depending on the webpage, and because I am writing the results to a new Results sheet, I add in the URL before each result to indicate source of information.


    TODO:

    1. Refactor the code to be more modular
    2. Potentially add in some error handling

    CSS selectors:


    ① I select the Name element, which I refer to as title, from the Particulars of substantial Securities Holder table.

    Example name element:

    Inspecting the HTML for this element shows it has a class of formContentLabel, and that it is the first class with this value on the page.

    Example HTML for target Name:

    This means I can use a class selector , .formContentLabel, to target the element. As it is a single element I want I use the querySelector method to apply the CSS selector.


    ② I target the rows of interest in the Details of Changes table with a selector combination of .ven_table tr. This is descendant selector combination combining selecting elements with tr tag having parent with class ven_table. As these are multiple elements I use the querySelectorAll method to apply the CSS selector combination.

    Example of a target row:


    Example results returned by CSS selector (sample):

    The rows I am interested start at 1 and repeat every + 4 rows after e.g. row 5 , 9 etc. So I use a little maths in the code to return just the rows of interest:

    Set currentRow = data.item(i * 4 + 1)
    

    VBA:

    Option Explicit
    Public Sub GetInfo()
        Dim IE As New InternetExplorer, headers(), u As Long, resultCollection As Collection
        headers = Array("URL", "Name", "No", "Date of change", "# Securities", "Type of Transaction", "Nature of Interest")
        Set resultCollection = New Collection
        Dim links()
        links = Application.Transpose(ThisWorkbook.Worksheets("Sheet1").Range("A2:A3")) 'A100
    
        With IE
            .Visible = True
    
            For u = LBound(links) To UBound(links)
                If InStr(links(u), "http") > 0 Then
                    .navigate links(u)
    
                    While .Busy Or .readyState < 4: DoEvents: Wend
                    Application.Wait Now + TimeSerial(0, 0, 1) '<you may not always need this. Or may need to increase.
                    Dim data As Object, title As Object
                    With .document.getElementById("bm_ann_detail_iframe").contentDocument
                        Set title = .querySelector(".formContentData")
                        Set data = .querySelectorAll(".ven_table tr")
                    End With
    
                    Dim results(), numberOfRows As Long, i As Long, currentRow As Object, td As Object, c As Long, r As Long
    
                    numberOfRows = Round(data.Length / 4, 0)
                    ReDim results(1 To numberOfRows, 1 To 7)
    
                    For i = 0 To numberOfRows - 1
                        r = i + 1
                        results(r, 1) = links(u): results(r, 2) = title.innerText
                        Set currentRow = data.item(i * 4 + 1)
                        c = 3
                        For Each td In currentRow.getElementsByTagName("td")
                            results(r, c) = Replace$(td.innerText, "document.write(rownum++);", vbNullString)
                            c = c + 1
                        Next td
                    Next i
                    resultCollection.Add results
                    Set data = Nothing: Set title = Nothing
                End If
            Next u
            .Quit
        End With
        Dim ws As Worksheet, item As Long
        If Not resultCollection.Count > 0 Then Exit Sub
    
        If Not Evaluate("ISREF('Results'!A1)") Then '<==Credit to @Rory for this test
            Set ws = Worksheets.Add
            ws.NAME = "Results"
        Else
            Set ws = ThisWorkbook.Worksheets("Results")
            ws.cells.Clear
        End If
    
        Dim outputRow As Long: outputRow = 2
        With ws
            .cells(1, 1).Resize(1, UBound(headers) + 1) = headers
            For item = 1 To resultCollection.Count
                Dim arr()
                arr = resultCollection(item)
                For i = LBound(arr, 1) To UBound(arr, 1)
                    .cells(outputRow, 1).Resize(1, 7) = Application.WorksheetFunction.Index(arr, i, 0)
                    outputRow = outputRow + 1
                Next
            Next
        End With
    End Sub
    

    Example results using 2 provided tests URLs:


    Sample URLs in sheet1:

    1. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5928057
    2. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201
    0 讨论(0)
提交回复
热议问题