CSS selector QuerySelector alternative

醉酒当歌 提交于 2021-02-04 21:41:04

问题


I have searched a lot and a lot so as to find material about how to get meta data using XMLHTTP. And I think that's impossible to do that using the Early binding method. The only approach that will work is the late binding by CreateObject("HTMLFile") and dealing with that HTML which is late binding. The disadvantage of this approach is that it doesn't support the use of the QuerySelector or QuerySelectorAll.. Now I am trying to find alternative to this CSS selector .. without using the QuerySelector

Set post = .querySelector("table div span[itemprop='lowPrice']")

This arises an error .. and I can't find easier way to find the element Here's the HTML content

<table class="p">
    <tbody><tr>
        <td class="foto">
            <div class="foto">
                        <a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/#gallery-open" target="_blank" class="gallery-link product-detail__gallery-link" onclick="dataLayer.push({'event':'sendEvent','event_category':'Product Detail - Desktop','event_action':'Gallery','event_label':'Otev\u0159en\u00ed galerie','event_value':0});">
                           <img src="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9--mmf250x250.jpg" alt="Brit Premium by Nature Adult L 15 kg" width="250" height="250" id="picture-main">
                            <span class="image-hover">
                                <span class="image-overlay"></span>
                                <span class="js-test-image-count-info image-count-info">Galerie <span class="picture-count">(2)</span></span>
                            </span>
                            <span class="product-detail__gallery-link__image__count-info">Galerie
                                <span class="product-detail__gallery-link__image__count-info__count">(2)</span>
                            </span>
                        </a>
                        <a href="https://krmivo-psy.heureka.cz/top-produkty/" class="top-ico gtm-header-link" data-gtm-link-description="Pořadí v TOP produktech"><span>Top</span><strong>1.</strong></a>
                    <div class="poty-ico">
                        <a href="http://www.produktroku.cz/" target="_blank"><img src="https://im9.cz/iR/recenze-externi/107.png" alt="Produkt Roku 2019" class="product-of-year-badge"></a></div>



            </div>

        </td>
        <td>
<div class="main-info">
    <div class="text-cover">
        <div id="n649054946" data-id="649054946" class="item js-public-product-id">
                <h2 itemprop="name">Brit Premium by Nature Adult L 15 kg</h2>
        </div>
        <div class="rating-box" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">

            <p class="eval">
                <strong itemprop="ratingValue">95%</strong>
                <a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section">
                    <span class="rating"><span class="hidden">Hodnocení produktu: 95%</span><span class="over" title="Hodnocení produktu: 95%"><span style="width: 75px;"></span></span></span>
                </a>
            </p>

            <span class="hidden-microdata" itemprop="ratingCount">
                456
            </span>

            <p class="review-count delimiter-blank">
                <a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/recenze/#section" class="gtm-header-link" data-gtm-link-description="Počet recenzí">
                    <span itemprop="reviewCount">344</span>
                    recenzí
                </a>
            </p>
            <div class="cleaner"></div>
            <p class="rating-box__item rating-box__favourite">
                <a href="https://ucet.heureka.cz/prihlaseni?callbackUrl=https%3A%2F%2Fkrmivo-psy.heureka.cz%2Fbrit-premium-by-nature-adult-l-15-kg%2F" title="Chci to" class="gtm-header-link" data-gtm-link-description="Akce - oblíbené">Přidat do oblíbených</a>
            </p>

            <p id="cli649054946" class="rating-box__item rating-box__compare delimiter-blank cl-add">
                <a class="checkbox gtm-header-link" data-gtm-link-description="Akce - porovnání" href="#" title="Porovnat">Přidat do porovnání</a>
            </p>
            
            <p class="delimiter-blank rating-box__item rating-box__price-watch js-price-watch-button">
                <a href="#" title="Hlídat cenu" class="gtm-header-link" data-gtm-link-description="Akce - hlídat cenu">
                        Hlídat cenu
                </a>
            </p>

            <p class="add-review rating-box__item rating-box__add-review delimiter-blank">
                <a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section" class="gtm-header-link" data-gtm-link-description="Akce - přidat recenzi">
                    Přidat recenzi
                </a>
            </p>
        </div>

        <div id="top-shop-info" class="top-shop-info">
            <div class="inner">
            <div class="guar">
                <div>
                    <img class="guar-badge" src="https://im9.cz/css-v2/images/guaranty-seal.png?1" alt="Garance nákupu - SpokojenyPes.cz" width="27" height="34">
                </div>
            </div>

        <div class="shop-claim bold">
            <strong>Produkt vám dodá:</strong>
        </div>
        <div class="shop-logo">
            <a href="https://www.heureka.cz/exit/spokojenypes-cz/3180319922/?z=41" target="_blank" rel="nofollow noopener" class="gtm-header-link" data-gtm-link-description="Exit - produkt vám dodá">
                    <img src="https://im9.cz/iR/importobchod-orig/1983_logo--mmf130x40.png" alt="SpokojenyPes.cz" width="130" height="40">
            </a>
        </div>

        <div class="recommendation">
            <a href="https://obchody.heureka.cz/spokojenypes-cz/recenze/" class="gtm-header-link" data-gtm-link-description="Hodnocení - Produkt vám dodá">
                99% zákazníků doporučuje obchod
            </a>
        </div>

            <div class="delivery-info bold price-delivery-free">
                Doprava zdarma
            </div>
                <div class="availability-info bold in-stock">
            skladem
        </div>


    </div>
    <a data-gtm-link-description="Další nabídky" id="top-shop-count-info" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/porovnat-ceny/#section" class="top-shop-count-info box-active gtm-header-link">Dalších 134 nabídek od 728 Kč</a>
        </div>

        <p class="desc">
            <span id="product-short-description">
                    Kompletní krmivo Brit Premium pro dospělé psy. Kuřecí receptura pro dospělé psy velkých plemen (25 - 45 kg). 
                <a id="product-short-description-button" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section" title="celá specifikace Brit Premium by Nature Adult L 15 kg">celá specifikace</a>
            </span>
        </p>
    </div>

    <div itemprop="offers" itemscope="" itemtype="http://schema.org/AggregateOffer" style="display:none">
        <span itemprop="lowPrice">728.00</span>
        <span itemprop="highPrice">1579.00</span>
        <span itemprop="offerCount">135</span>
            <link itemprop="availability" href="http://schema.org/InStock">
    </div>

    <div itemprop="offers" itemscope="" itemtype="http://schema.org/Offer" class="price-from shopping-cart">
        <link itemprop="itemCondition" href="http://schema.org/OfferItemCondition" content="http://schema.org/NewCondition">
            <link itemprop="availability" href="http://schema.org/InStock">
        <link itemprop="category" href="http://schema.org/category" content="Hobby / Chovatelství / Pro psy / Krmivo pro psy">
        <link itemprop="image" href="http://schema.org/image" content="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9.jpg">
                        <div class="top-left">
                <div id="top-button" class="buy-click-observed">
<p class="buy">
    <a href="#" class="flat-button flat-button--top-position flat-button--orange buy-btn hb hb-3180319922 js-top-pos-btn" data-cart-position="0">
        <i class="ico basket"></i>
        <i class="ico check"></i>
        <span class="in">Koupit na Heurece</span>
        <span class="in replace">Přidáno do košíku</span>
    </a>
</p>
                </div>

                <div class="n" id="top-offer-price">
<p class="buy-price">
    <span itemprop="price" class="js-top-price" content="839.00">839 Kč</span>
    <span class="price-vat-title small">s DPH</span>
    <span itemprop="priceCurrency" content="CZK"></span>
</p>
                </div>


                <div class="clear"></div>
                <div class="js-top-gifts-info top-shop-gifts-info-box">
                </div>

            </div>
            <div class="clear"></div>
        <div class="clear"></div>
    </div>
    <span id="new-pd"></span>
    <script>
        (function() {
            loadScript("https:\/\/im9.cz\/js\/cache\/7e39f733-1-42bd9e7837b830d87e1af94da6d0e4a82055c56f.hash.js", function () {
                var productHeadObserver = new ProductHeadObserver({ 'topShortDescElm': $('product-short-description'), 'topShopBox': $('top-shop-info'), 'maxOfferNameLength': 90 });
                productHeadObserver.oneOfferInit();
            });

                H.Awards._reviewClick($$('#awards-list span.pa'));
                var notSelectedCallback = function() {
                    if ('undefined' != typeof H.ShoppingCartHelper.BuyMoreOptions &&
                        typeof H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback == 'function') {
                        H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback();
                    }
                };
                H.ShoppingCartHelper.observeBuyClick($('top-button'), new H.ShoppingCart(), notSelectedCallback, 'js-top-pos-btn');
        })();
    </script>

    <div class="clear"></div>


</div>
        </td>
    </tr>
</tbody></table>

This is the whole HTML https://pastebin.com/Dgu1wk2b

Here's the code till now

Sub MyTest()
Dim source      As Object
Dim obj         As Object
Dim resp        As String
Dim post As Object
Dim a, i As Long

With CreateObject("MSXML2.xmlHttp")
    .Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
    .send
    resp = .responseText
End With

With CreateObject("HTMLFile")
    .write resp
    Set post = .getElementsByTagName("meta")

    For i = 0 To post.Length - 1
        On Error Resume Next
        Debug.Print post.item(i).getAttribute("name")
        If post.item(i).getAttribute("name") = "gtm:product_id" Then
            Cells(2, 1).Value = post.item(i).Value
        End If
        If post.item(i).getAttribute("name") = "gtm:product_name" Then
            Cells(2, 3).Value = post.item(i).Value
        End If
        If post.item(i).getAttribute("name") = "gtm:product_brand" Then
            Cells(2, 4).Value = post.item(i).Value
        End If
        On Error GoTo 0
    Next i

    Set post = Nothing

    Set post = .getElementsByTagName("link")
    For i = 0 To post.Length - 1
        On Error Resume Next
        If post.item(i).getAttribute("rel") = "canonical" Then
            Cells(2, 2).Value = post.item(i).href
        End If
        On Error GoTo 0
    Next i

    'I am stuck here
    'Set post = .querySelector("table div span[itemprop='lowPrice']")
    'Debug.Print .getElementsByTagName("table")(0).innerHTML
End With

End Sub

回答1:


As you have discovered HEAD tag info (where meta stuff lives) is stripped out when you use document.body.innerHTML = .responseText with early-bound MSHTML.HTMLDocument. Kinda what you would expect considering what you are populating (document.body). That is why you are unable to select the meta info. With your late bound HTMLFile (where you can't use querySelector) you are using .write method which is writing to your document (HTMLFile) and thereby retaining the HEAD info.

You need to ensure that the HEAD info ends up within BODY tags. Either as part of response body or extracted HEAD concatenated with new BODY tags and written to HTMLDocument if wishing to use early binding.

E.g. for clarity I am writing HEAD info between BODY tags only (Without rest of existing response)

Option Explicit

Public Sub MetaInfoEarlyBound()
    Dim html As MSHTML.HTMLDocument, htmlHead As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60
    Dim re As VBScript_RegExp_55.RegExp

    Set htmlHead = New MSHTML.HTMLDocument
    Set html = New MSHTML.HTMLDocument
    Set xhr = New MSXML2.XMLHTTP60    
    Set re = New VBScript_RegExp_55.RegExp

    re.Pattern = "<head>([\s\S]+)<\/head>"

    With xhr
        .Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
        .send
        htmlHead.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
        html.body.innerHTML = .responseText
    End With

    Debug.Print htmlHead.querySelector("[name='gtm:product_price']").Value
    Debug.Print html.querySelector("[itemprop=lowPrice]").innerText

End Sub

As an aside, I add two shorter methods (than current other answer) to achieve your goal with late-bound. Note I have commented one out.

Public Sub MetaInfoLateBound()
    Dim resp As String

    With CreateObject("MSXML2.xmlHttp")
        .Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
        .send
        resp = .responseText
    End With

    With CreateObject("HTMLFile")

        .write resp

'        Dim post As Object
'
'        Set post = .getElementById("new-pd")
'        Debug.Print post.PreviousSibling.PreviousSibling.getElementsByTagName("span")(0).innertext
'
        Dim metas As Object, i As Long

        Set metas = .getElementsByTagName("meta")

        For i = 0 To metas.Length - 1
            If metas.Item(i).Name = "gtm:product_price" Then
                Debug.Print metas.Item(i).Value
                Exit For
            End If
        Next
    End With
End Sub



回答2:


Try this:

With CreateObject("HTMLFile")
    .Open
    .write resp
    .Close

    For Each tbl In .getElementsByTagName("table")
       For Each dv In tbl.getElementsByTagName("div")
            If dv.getattribute("itemprop") = "offers" Then     '<<EDIT
                For Each spn In dv.getElementsByTagName("span")
                    attr = ""
                    attr = spn.getattribute("itemprop")
                    If Len(attr) > 0 Then
                        If attr = "lowPrice" Then
                            Debug.Print spn.outerhtml
                            Debug.Print spn.innerText
                        End If
                    End If
                Next spn
            End If
        Next dv
    Next tbl
End With


来源:https://stackoverflow.com/questions/61019809/css-selector-queryselector-alternative

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!