I just trying to run through a list of url link, but it keep showing run time error\'91\',object variable or with block variable not set.
The data I want to extract is f
tl;dr
Your error is due to the fact there are different numbers of elements for the given class name depending on the results per page. So you can't used fixed indexes. For the page you indicated the last index for that class, via the iframe, is 9 i.e. ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText
. 10 and 11 are invalid. Below I show a way to determine the number of results and extract info from each result row.
Ok... so the following works on the principle of targeting the Details of Changes
table for most of the info.
Example extract:
More specifically, I target the rows that repeat the info for No, Date of Change, #Securities, Type of Transaction and Nature of Interest
. These values are stored in an array of arrays (one array per row of information). Then the results arrays are stored in a collection to later be written out to the sheet. I loop each table cell in the targeted rows (td
tag elements within parent tr
) to populate the arrays.
I add in the Name
from the table above on the page and also, because there can be more than one row of results, depending on the webpage, and because I am writing the results to a new Results
sheet, I add in the URL
before each result to indicate source of information.
TODO:
① I select the Name
element, which I refer to as title
, from the Particulars of substantial Securities Holder
table.
Example name element:
Inspecting the HTML for this element shows it has a class of formContentLabel
, and that it is the first class with this value on the page.
Example HTML for target Name:
This means I can use a class selector , .formContentLabel
, to target the element. As it is a single element I want I use the querySelector
method to apply the CSS selector.
② I target the rows of interest in the Details of Changes
table with a selector combination of .ven_table tr
. This is descendant selector combination combining selecting elements with tr
tag having parent with class ven_table
. As these are multiple elements I use the querySelectorAll
method to apply the CSS selector combination.
Example of a target row:
Example results returned by CSS selector (sample):
The rows I am interested start at 1 and repeat every + 4 rows after e.g. row 5 , 9 etc. So I use a little maths in the code to return just the rows of interest:
Set currentRow = data.item(i * 4 + 1)
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, headers(), u As Long, resultCollection As Collection
headers = Array("URL", "Name", "No", "Date of change", "# Securities", "Type of Transaction", "Nature of Interest")
Set resultCollection = New Collection
Dim links()
links = Application.Transpose(ThisWorkbook.Worksheets("Sheet1").Range("A2:A3")) 'A100
With IE
.Visible = True
For u = LBound(links) To UBound(links)
If InStr(links(u), "http") > 0 Then
.navigate links(u)
While .Busy Or .readyState < 4: DoEvents: Wend
Application.Wait Now + TimeSerial(0, 0, 1) '<you may not always need this. Or may need to increase.
Dim data As Object, title As Object
With .document.getElementById("bm_ann_detail_iframe").contentDocument
Set title = .querySelector(".formContentData")
Set data = .querySelectorAll(".ven_table tr")
End With
Dim results(), numberOfRows As Long, i As Long, currentRow As Object, td As Object, c As Long, r As Long
numberOfRows = Round(data.Length / 4, 0)
ReDim results(1 To numberOfRows, 1 To 7)
For i = 0 To numberOfRows - 1
r = i + 1
results(r, 1) = links(u): results(r, 2) = title.innerText
Set currentRow = data.item(i * 4 + 1)
c = 3
For Each td In currentRow.getElementsByTagName("td")
results(r, c) = Replace$(td.innerText, "document.write(rownum++);", vbNullString)
c = c + 1
Next td
Next i
resultCollection.Add results
Set data = Nothing: Set title = Nothing
End If
Next u
.Quit
End With
Dim ws As Worksheet, item As Long
If Not resultCollection.Count > 0 Then Exit Sub
If Not Evaluate("ISREF('Results'!A1)") Then '<==Credit to @Rory for this test
Set ws = Worksheets.Add
ws.NAME = "Results"
Else
Set ws = ThisWorkbook.Worksheets("Results")
ws.cells.Clear
End If
Dim outputRow As Long: outputRow = 2
With ws
.cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For item = 1 To resultCollection.Count
Dim arr()
arr = resultCollection(item)
For i = LBound(arr, 1) To UBound(arr, 1)
.cells(outputRow, 1).Resize(1, 7) = Application.WorksheetFunction.Index(arr, i, 0)
outputRow = outputRow + 1
Next
Next
End With
End Sub
Example results using 2 provided tests URLs:
Sample URLs in sheet1: