问题
I have searched a lot and a lot so as to find material about how to get meta data using XMLHTTP. And I think that's impossible to do that using the Early binding method. The only approach that will work is the late binding by CreateObject("HTMLFile")
and dealing with that HTML which is late binding. The disadvantage of this approach is that it doesn't support the use of the QuerySelector
or QuerySelectorAll
..
Now I am trying to find alternative to this CSS selector .. without using the QuerySelector
Set post = .querySelector("table div span[itemprop='lowPrice']")
This arises an error .. and I can't find easier way to find the element Here's the HTML content
<table class="p">
<tbody><tr>
<td class="foto">
<div class="foto">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/#gallery-open" target="_blank" class="gallery-link product-detail__gallery-link" onclick="dataLayer.push({'event':'sendEvent','event_category':'Product Detail - Desktop','event_action':'Gallery','event_label':'Otev\u0159en\u00ed galerie','event_value':0});">
<img src="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9--mmf250x250.jpg" alt="Brit Premium by Nature Adult L 15 kg" width="250" height="250" id="picture-main">
<span class="image-hover">
<span class="image-overlay"></span>
<span class="js-test-image-count-info image-count-info">Galerie <span class="picture-count">(2)</span></span>
</span>
<span class="product-detail__gallery-link__image__count-info">Galerie
<span class="product-detail__gallery-link__image__count-info__count">(2)</span>
</span>
</a>
<a href="https://krmivo-psy.heureka.cz/top-produkty/" class="top-ico gtm-header-link" data-gtm-link-description="Pořadí v TOP produktech"><span>Top</span><strong>1.</strong></a>
<div class="poty-ico">
<a href="http://www.produktroku.cz/" target="_blank"><img src="https://im9.cz/iR/recenze-externi/107.png" alt="Produkt Roku 2019" class="product-of-year-badge"></a></div>
</div>
</td>
<td>
<div class="main-info">
<div class="text-cover">
<div id="n649054946" data-id="649054946" class="item js-public-product-id">
<h2 itemprop="name">Brit Premium by Nature Adult L 15 kg</h2>
</div>
<div class="rating-box" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<p class="eval">
<strong itemprop="ratingValue">95%</strong>
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section">
<span class="rating"><span class="hidden">Hodnocení produktu: 95%</span><span class="over" title="Hodnocení produktu: 95%"><span style="width: 75px;"></span></span></span>
</a>
</p>
<span class="hidden-microdata" itemprop="ratingCount">
456
</span>
<p class="review-count delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/recenze/#section" class="gtm-header-link" data-gtm-link-description="Počet recenzí">
<span itemprop="reviewCount">344</span>
recenzí
</a>
</p>
<div class="cleaner"></div>
<p class="rating-box__item rating-box__favourite">
<a href="https://ucet.heureka.cz/prihlaseni?callbackUrl=https%3A%2F%2Fkrmivo-psy.heureka.cz%2Fbrit-premium-by-nature-adult-l-15-kg%2F" title="Chci to" class="gtm-header-link" data-gtm-link-description="Akce - oblíbené">Přidat do oblíbených</a>
</p>
<p id="cli649054946" class="rating-box__item rating-box__compare delimiter-blank cl-add">
<a class="checkbox gtm-header-link" data-gtm-link-description="Akce - porovnání" href="#" title="Porovnat">Přidat do porovnání</a>
</p>
<p class="delimiter-blank rating-box__item rating-box__price-watch js-price-watch-button">
<a href="#" title="Hlídat cenu" class="gtm-header-link" data-gtm-link-description="Akce - hlídat cenu">
Hlídat cenu
</a>
</p>
<p class="add-review rating-box__item rating-box__add-review delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section" class="gtm-header-link" data-gtm-link-description="Akce - přidat recenzi">
Přidat recenzi
</a>
</p>
</div>
<div id="top-shop-info" class="top-shop-info">
<div class="inner">
<div class="guar">
<div>
<img class="guar-badge" src="https://im9.cz/css-v2/images/guaranty-seal.png?1" alt="Garance nákupu - SpokojenyPes.cz" width="27" height="34">
</div>
</div>
<div class="shop-claim bold">
<strong>Produkt vám dodá:</strong>
</div>
<div class="shop-logo">
<a href="https://www.heureka.cz/exit/spokojenypes-cz/3180319922/?z=41" target="_blank" rel="nofollow noopener" class="gtm-header-link" data-gtm-link-description="Exit - produkt vám dodá">
<img src="https://im9.cz/iR/importobchod-orig/1983_logo--mmf130x40.png" alt="SpokojenyPes.cz" width="130" height="40">
</a>
</div>
<div class="recommendation">
<a href="https://obchody.heureka.cz/spokojenypes-cz/recenze/" class="gtm-header-link" data-gtm-link-description="Hodnocení - Produkt vám dodá">
99% zákazníků doporučuje obchod
</a>
</div>
<div class="delivery-info bold price-delivery-free">
Doprava zdarma
</div>
<div class="availability-info bold in-stock">
skladem
</div>
</div>
<a data-gtm-link-description="Další nabídky" id="top-shop-count-info" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/porovnat-ceny/#section" class="top-shop-count-info box-active gtm-header-link">Dalších 134 nabídek od 728 Kč</a>
</div>
<p class="desc">
<span id="product-short-description">
Kompletní krmivo Brit Premium pro dospělé psy. Kuřecí receptura pro dospělé psy velkých plemen (25 - 45 kg).
<a id="product-short-description-button" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section" title="celá specifikace Brit Premium by Nature Adult L 15 kg">celá specifikace</a>
</span>
</p>
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/AggregateOffer" style="display:none">
<span itemprop="lowPrice">728.00</span>
<span itemprop="highPrice">1579.00</span>
<span itemprop="offerCount">135</span>
<link itemprop="availability" href="http://schema.org/InStock">
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/Offer" class="price-from shopping-cart">
<link itemprop="itemCondition" href="http://schema.org/OfferItemCondition" content="http://schema.org/NewCondition">
<link itemprop="availability" href="http://schema.org/InStock">
<link itemprop="category" href="http://schema.org/category" content="Hobby / Chovatelství / Pro psy / Krmivo pro psy">
<link itemprop="image" href="http://schema.org/image" content="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9.jpg">
<div class="top-left">
<div id="top-button" class="buy-click-observed">
<p class="buy">
<a href="#" class="flat-button flat-button--top-position flat-button--orange buy-btn hb hb-3180319922 js-top-pos-btn" data-cart-position="0">
<i class="ico basket"></i>
<i class="ico check"></i>
<span class="in">Koupit na Heurece</span>
<span class="in replace">Přidáno do košíku</span>
</a>
</p>
</div>
<div class="n" id="top-offer-price">
<p class="buy-price">
<span itemprop="price" class="js-top-price" content="839.00">839 Kč</span>
<span class="price-vat-title small">s DPH</span>
<span itemprop="priceCurrency" content="CZK"></span>
</p>
</div>
<div class="clear"></div>
<div class="js-top-gifts-info top-shop-gifts-info-box">
</div>
</div>
<div class="clear"></div>
<div class="clear"></div>
</div>
<span id="new-pd"></span>
<script>
(function() {
loadScript("https:\/\/im9.cz\/js\/cache\/7e39f733-1-42bd9e7837b830d87e1af94da6d0e4a82055c56f.hash.js", function () {
var productHeadObserver = new ProductHeadObserver({ 'topShortDescElm': $('product-short-description'), 'topShopBox': $('top-shop-info'), 'maxOfferNameLength': 90 });
productHeadObserver.oneOfferInit();
});
H.Awards._reviewClick($$('#awards-list span.pa'));
var notSelectedCallback = function() {
if ('undefined' != typeof H.ShoppingCartHelper.BuyMoreOptions &&
typeof H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback == 'function') {
H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback();
}
};
H.ShoppingCartHelper.observeBuyClick($('top-button'), new H.ShoppingCart(), notSelectedCallback, 'js-top-pos-btn');
})();
</script>
<div class="clear"></div>
</div>
</td>
</tr>
</tbody></table>
This is the whole HTML https://pastebin.com/Dgu1wk2b
Here's the code till now
Sub MyTest()
Dim source As Object
Dim obj As Object
Dim resp As String
Dim post As Object
Dim a, i As Long
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
Set post = .getElementsByTagName("meta")
For i = 0 To post.Length - 1
On Error Resume Next
Debug.Print post.item(i).getAttribute("name")
If post.item(i).getAttribute("name") = "gtm:product_id" Then
Cells(2, 1).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_name" Then
Cells(2, 3).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_brand" Then
Cells(2, 4).Value = post.item(i).Value
End If
On Error GoTo 0
Next i
Set post = Nothing
Set post = .getElementsByTagName("link")
For i = 0 To post.Length - 1
On Error Resume Next
If post.item(i).getAttribute("rel") = "canonical" Then
Cells(2, 2).Value = post.item(i).href
End If
On Error GoTo 0
Next i
'I am stuck here
'Set post = .querySelector("table div span[itemprop='lowPrice']")
'Debug.Print .getElementsByTagName("table")(0).innerHTML
End With
End Sub
回答1:
As you have discovered HEAD
tag info (where meta stuff lives) is stripped out when you use document.body.innerHTML = .responseText
with early-bound MSHTML.HTMLDocument
. Kinda what you would expect considering what you are populating (document.body
). That is why you are unable to select the meta
info. With your late bound HTMLFile
(where you can't use querySelector
) you are using .write
method which is writing to your document (HTMLFile
) and thereby retaining the HEAD
info.
You need to ensure that the HEAD
info ends up within BODY
tags. Either as part of response body or extracted HEAD
concatenated with new BODY
tags and written to HTMLDocument
if wishing to use early binding.
E.g. for clarity I am writing HEAD
info between BODY
tags only (Without rest of existing response)
Option Explicit
Public Sub MetaInfoEarlyBound()
Dim html As MSHTML.HTMLDocument, htmlHead As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60
Dim re As VBScript_RegExp_55.RegExp
Set htmlHead = New MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "<head>([\s\S]+)<\/head>"
With xhr
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
htmlHead.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
html.body.innerHTML = .responseText
End With
Debug.Print htmlHead.querySelector("[name='gtm:product_price']").Value
Debug.Print html.querySelector("[itemprop=lowPrice]").innerText
End Sub
As an aside, I add two shorter methods (than current other answer) to achieve your goal with late-bound. Note I have commented one out.
Public Sub MetaInfoLateBound()
Dim resp As String
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
' Dim post As Object
'
' Set post = .getElementById("new-pd")
' Debug.Print post.PreviousSibling.PreviousSibling.getElementsByTagName("span")(0).innertext
'
Dim metas As Object, i As Long
Set metas = .getElementsByTagName("meta")
For i = 0 To metas.Length - 1
If metas.Item(i).Name = "gtm:product_price" Then
Debug.Print metas.Item(i).Value
Exit For
End If
Next
End With
End Sub
回答2:
Try this:
With CreateObject("HTMLFile")
.Open
.write resp
.Close
For Each tbl In .getElementsByTagName("table")
For Each dv In tbl.getElementsByTagName("div")
If dv.getattribute("itemprop") = "offers" Then '<<EDIT
For Each spn In dv.getElementsByTagName("span")
attr = ""
attr = spn.getattribute("itemprop")
If Len(attr) > 0 Then
If attr = "lowPrice" Then
Debug.Print spn.outerhtml
Debug.Print spn.innerText
End If
End If
Next spn
End If
Next dv
Next tbl
End With
来源:https://stackoverflow.com/questions/61019809/css-selector-queryselector-alternative