问题
I'm trying to scrape data from "https://beacon.schneidercorp.com/" and need to achieve:
- Set "Iowa" on the state combobox and "Adair County, IA" in the County/city/area combobox
- Bring the Property Search button
- Click the Property Search button and get to the next page
After all this, the browser gets to "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&LayerID=22042&PageTypeID=2&PageID=9328" which is my main goal.
I filled the comboboxes (tagname="option") but the next problems came up:
a. The Property Search I want to click to get to the next page, doesn't pop up until I physically click and select one option on the County/city/area combobox
This is the routine that fills the comboboxes
Sub extraccionCondados2()
Dim IE As New SHDocVw.InternetExplorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlElementos As MSHTML.IHTMLElementCollection
Dim htmlElemento As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "https://beacon.schneidercorp.com/"
Do While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set htmlDoc = IE.document
Set htmlElementos = htmlDoc.getElementsByClassName("form-control input-lg")
htmlElementos(0).Value = "Iowa" 'POPULATES THE STATE COMBOBOX
htmlElementos(1).Value = "1034" 'POPULATES THE COUNTY/CITY/AREA WITH THE RIGHT VALUE
htmlElementos(1).Click 'IN THIS CASE THIS LINE DOESN'T DO ANYTHING
'I'VE TRIED WORKING WITH htmlElementos CHILDREN BUT DIDN'T FIND A WAY TO DO IT
End Sub
b. The href I'm looking for doesn't come up until the Property Search is brought to the view
The id="quickstartList" is empty before the Property Search is shown
The id="quickstartList" got new children after the Property Search is shown and has my target URL
How do I bring the Property Search button, or better, fetch the href on the second image?
回答1:
Some advice on using MSXML2.ServerHTTP objects to automate web-scraping using your target website as an example.
Firstly, you can get to the page you wanted in the question like this:
Sub Example1()
Dim con As New MSXML2.ServerXMLHTTP60 ' A web request object - must add project reference to "Microsoft XML, V6.0" in Tools > References
' Opens a new GET request (no hidden info) for the url
con.Open "GET", "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&PageTypeID=2"
con.setRequestHeader "Content-type", "application/x-www-form-urlencoded" ' set a standard content-type for the request
con.send searchBody ' Send the request
MsgBox con.responseText
End Sub
Note in the URL I've only had to include AppID=1034
for Adair county and PageTypeID=2
for property search (I think pagetypeId 1 was map). You can get the full list of AppID from the main page just by looking at the HTML (I guess you've figured out how to do this already). The MsgBox just shows that the con
object has returned the response as an html document.
While working on your project and to help debug and look at html, if you want to view any response from a request at leisure, I use the below function to save a string as a text file:
Sub WriteToFile(s As String, n As String)
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.CreateTextFile(n)
oFile.WriteLine s
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
So for the above code I'd call that function at the end to save my response as text files which I can view as HTML using notepad++. You can just view the html in the F12 dev tool too without saving it.
I've also included below an HTMLdocument
object, which I put the response into.
Sub Example2()
Dim con As New MSXML2.ServerXMLHTTP60 ' A web request object - must add project reference to "Microsoft XML, V6.0" in Tools > References
Dim html As New HTMLDocument ' An html document to hold responses, used to parse info - add reference to "Microsoft HTML Object Library"
' Opens a new GET request (no hidden info) for the url
con.Open "GET", "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&PageTypeID=2"
con.setRequestHeader "Content-type", "application/x-www-form-urlencoded" ' set a standard content-type for the request
con.send searchBody ' Send the request
WriteToFile con.responseText, "C:\Users\JamHeadArt\Documents\responseText.txt"
html.body.innerhtml = con.responseBody
End Sub
With the html
document populated, you can then use things like getElementByID
to help parse results etc. It's just another form of XML so you can traverse nodes and find things by child/parent relationships etc.
Using the F12 dev tool
I can figure out this stuff using the F12 developer tool, under network. Before clicking a search button or whatever, just clear the network traffic and then when you click a search, you'll see a bunch of requests. The first one is usually the one you want to check out and basically mimic (the rest of the requests will be javascript firing, css, images, general stuff). Any request has a URL and sometimes a BODY if it's a post request.
Without going in to TOO much detail, you can usually skip a whole bunch of search steps and pages, and get the info you need by knowing the structure and parameters of that final search, making literally one call to the website, with the return info parsed directly into Excel. No browsers used, much much faster.
After selecting Iowa, did you find the html for the drop down list in the html that has all the option values?
<optgroup label="Iowa">
<option value="1034">Adair County, IA</option>
<option value="78">Allamakee County, IA</option>
<option value="165">Ames, IA</option>
<option value="96">Audubon County, IA</option>
<option value="83">Benton County, IA</option>
<option value="84">Boone County, IA</option>
<option value="330">Bremer County, IA</option>
<option value="1015">Buena Vista County, IA</option>
<option value="215">Cass County, IA</option>
<option value="408">Cerro Gordo County, IA</option>
<option value="501">Cherokee County, IA</option>
<option value="47">Chickasaw County, IA</option>
<option value="29">City of Ames, IA - Traffic Accident Database</option>
<option value="933">City of Cascade, IA</option>
<option value="516">City of Estherville, IA</option>
<option value="1061">City of Sigourney, IA</option>
<option value="1043">Clay County, IA</option>
<option value="227">Clayton County, IA</option>
<option value="375">Clinton County, IA</option>
<option value="909">Dallas County, IA</option>
<option value="49">Davis County, IA</option>
<option value="72">Delaware County, IA</option>
<option value="376">Dickinson County, IA</option>
<option value="93">Dubuque County, IA</option>
<option value="15">Emmet County, IA</option>
<option value="79">Fayette County, IA</option>
<option value="82">Floyd County, IA</option>
<option value="150">Franklin County, IA</option>
<option value="825">Fremont County, IA</option>
<option value="1064">Greene County, IA</option>
<option value="3">Grundy County, IA</option>
<option value="395">Guthrie County, IA</option>
<option value="140">Hardin County, IA</option>
<option value="44">Harrison County, IA</option>
<option value="60">Henry County, IA</option>
<option value="617">Humboldt County, IA</option>
<option value="80">Jackson County, IA</option>
<option value="325">Jasper County, IA</option>
<option value="1037">Jefferson County, IA</option>
<option value="86">Johnson County, IA</option>
<option value="164">Jones County, IA</option>
<option value="81">Keokuk County, IA</option>
<option value="177">Lee County, IA</option>
<option value="54">Louisa County, IA</option>
<option value="594">Lyon County, IA</option>
<option value="406">Madison County, IA</option>
<option value="25">Mahaska County, IA</option>
<option value="70">Marion County, IA</option>
<option value="1026">Marshall County, IA</option>
<option value="410">Mason City, IA</option>
<option value="153">Mills County, IA</option>
<option value="929">Mitchell County, IA</option>
<option value="21">Montgomery County, IA</option>
<option value="12">Muscatine Area Geographic Information Consortium (MAGIC)</option>
<option value="331">O'Brien County, IA</option>
<option value="611">Osceola County, IA</option>
<option value="220">Page County, IA</option>
<option value="218">Palo Alto County, IA</option>
<option value="1012">Plymouth County, IA</option>
<option value="144">Pocahontas County, IA</option>
<option value="135">Poweshiek County, IA</option>
<option value="508">Ringgold County, IA</option>
<option value="75">Sac County, IA</option>
<option value="1024">Scott County / City of Davenport, Iowa</option>
<option value="11">Shelby County, IA</option>
<option value="10">Sioux City, IA</option>
<option value="984">Sioux County, IA</option>
<option value="165">Story County, IA / City of Ames</option>
<option value="225">Union County, IA</option>
<option value="595">Wapello County, IA</option>
<option value="9">Warren County, IA</option>
<option value="1036">Washington County, IA</option>
<option value="723">Webster County, IA</option>
<option value="73">Winnebago County, IA</option>
<option value="110">Winneshiek County, IA</option>
<option value="10">Woodbury County, IA / Sioux City</option>
<option value="588">Worth County, IA</option>
<option value="399">Wright County, IA</option>
</optgroup>
回答2:
You must trigger the change event after each selection from a combobox:
Sub extraccionCondados2()
Dim IE As New SHDocVw.InternetExplorer
Dim htmlDoc As MSHTML.htmlDocument
Dim htmlElementos As MSHTML.IHTMLElementCollection
Dim htmlElemento As MSHTML.IHTMLElement
Dim urlFromPropertySearchButton As String
IE.Visible = True
IE.navigate "https://beacon.schneidercorp.com/"
Do While IE.readyState <> 4: DoEvents: Loop
Set htmlDoc = IE.document
Set htmlElementos = htmlDoc.getElementsByClassName("form-control input-lg")
'Select state and trigger html change event of the combobox
htmlElementos(0).Value = "Iowa"
Call TriggerEvent(htmlDoc, htmlElementos(0), "change")
'Select country/city/area and trigger html change event of the combobox
htmlElementos(1).Value = "1034"
Call TriggerEvent(htmlDoc, htmlElementos(1), "change")
'Get property search button
Set htmlElemento = htmlDoc.getElementsByClassName("list-group-item track-mru")(0)
'If needed as string read url
urlFromPropertySearchButton = htmlElemento.href
'You have the url before clicking the button
MsgBox urlFromPropertySearchButton
'If you want to open the page for selection
htmlElemento.Click
End Sub
This procedure to trigger a html event:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
来源:https://stackoverflow.com/questions/62252608/fetch-href-from-webpage-after-selecting-from-combobox