问题
I am trying to scrape some table-looking items from a website into Excel.
I'm no stranger to coding in general, though I'm pretty new to VBA in an Excel sense :)
I have tried using Excel's Data>From Web interface, it's not recognizing the table. I'm guessing it's because it's built using (or at least that's what my Google-Fu has lead me to understand).
Snipping of what the second table looks like
<html>
<frame title="links" ...>...</frame>
<frame title="queue">
#document
<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>
<table id="oTable">
<colgroup>...</colgroup>
<thead>...</thead>
<tbody>
<tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">12345</a></td>
<td nowrap=""><a href="`" target="_Blank">28/08/2018 17:00:49</a></td>
<td nowrap=""><a href="URL" target="_Blank">11/09/2018 16:28:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">5,599,976</a></td>
<td nowrap=""><a href="URL" target="_Blank">dijm</a></td></tr>
<tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">23456</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:24</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:35:30</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,287</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">78901</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:02</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,339</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">34567</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:17</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:43</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,344</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
I have tried various solutions that look somewhat like this: https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpage and Scraping data from website using vba
and trying to define the frames themselves to try and get the info from there? (again: new to Excel VBA)
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document
With the above block of code I'm getting a "Run-time error '438' Without the above block I'm getting a "Run-time error '1004'
The info I eventually want is in each row:
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
Ideally I'd like to dump each element into a cell
67890 | 25/06/2019 11:01:01 | 09/07/2019 10:32:32 | 6,443,276
There's 20 of these rows on each page (there's a button to press to get to the next page which I'll figure out later...hopefully haha)
Massive premptive Thank You to anyone who can help :)
-EDIT- This is the code that I'm currently working with (not precious about it :P )
Private Sub CommandButton1_Click()
Dim ie As Object
Dim html As Object
Dim objElementTR As Object
Dim objTR As Object
Dim objElementsTD As Object
Dim objTD As Object
Dim result As String
Dim intRow As Long
Dim intCol As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "URL"
ie.Visible = True ' loop until page is loaded
Do Until (ie.ReadyState = 4 And Not ie.Busy)
DoEvents
Loop
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")
result = myHTMLFrame2
Set html = CreateObject("htmlfile")
myHTMLFrame2 = result
Set objElementTR = html.getElementsByTagName("tr")
ReDim myarray(0 To objElementTR.Length, 0 To 10)
For Each objTR In objElementTR
intRow = intRow + 1
Set objElementsTD = objTR.getElementsByTagName("td")
For Each objTD In objElementsTD
myarray(intRow, intCol) = objTD.innerText
intCol = intCol + 1
Next objTD
intCol = 0
Next objTR
With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
.Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
End With
End Sub
回答1:
You could try isolating the frame by its title attribute, then go via contentDocument and get the table by id
ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")
Then end .querySelector("#oTable")
can be interchanged with .getElementById("oTable")
I would then dump the .outerHTML
of the table via clipboard so as to paste table direct into sheet.
来源:https://stackoverflow.com/questions/56861132/excel-vba-web-scraping-table-elements-from-a-frameset-and-a-frame