Detecting external content with TEmbeddedWB or TWebBrowser

我们两清 提交于 2019-12-21 05:39:23

问题


I am trying to block anything external loaded by TEmbeddedWB or TWebBrowser (or TCppWebBrowser). I would like to block anything that is loaded from Internet including images, javascript, external CSS, external [embed] or [object] or [applet] or [frame] or [iframe], executing JavaScript that can load external content etc.

This problem consists of 2 parts:

  • putting web browser into "restrict all" (except basic HTML without images) and detecting if such content exists
  • if external content is not present ok, if it is, showing a "download bar" which after click puts web browser into "download all" mode and gets all content.

First item has issues. In TEmbeddedWB you can block almost anything using DownloadOptions switches and most important is ForceOffline switch but even with all of that turned off it still passes through some things like [object] or [iframe] tags. I know this is the case because I implemented OnBeforeNavigate2 event and it triggers for URLs contained in these tags and it also makes an entry in log of local server. Setting OfflineMode and ForceOfflineMode in TEmbeddedWB doesn't help for these items.

So how can I really block all? So it needs to start as basic HTML with blocked external elements including scripts and CSS. Is there a way to trigger an event every time it wants to download anything so it can be blocked or avoiding triggering such event in the first place by blocking all external downloads? Do I need to fiddle with Internet Explorer zones and security? Any pointer in right direction would be helpful.

Second item is also tricky because I need to detect if problematic tags are present (such as "applet", "script", "link" etc. This detection doesn't need to be perfect but it must at least be good enough to cover most of such tags. I've done it like this:

//----------------------------------------------------------------------
// Check for external content (images, scripts, ActiveX, frames...)
//----------------------------------------------------------------------
try
    {    
    bool                                HasExternalContent = false;
    DelphiInterface<IHTMLDocument2>     diDoc;                              // Smart pointer wrapper - should automatically call release() and do reference counting
    diDoc = TEmbeddedWB->Document;

    DelphiInterface<IHTMLElementCollection>     diColApplets;           DelphiInterface<IDispatch>          diDispApplets;      DelphiInterface<IHTMLObjectElement> diObj;
    DelphiInterface<IHTMLElementCollection>     diColEmbeds;            DelphiInterface<IDispatch>          diDispEmbeds;
    DelphiInterface<IHTMLFramesCollection2>     diColFrames;            DelphiInterface<IDispatch>          diDispFrames;
    DelphiInterface<IHTMLElementCollection>     diColImages;            DelphiInterface<IDispatch>          diDispImages;       DelphiInterface<IHTMLImgElement>    diImg;
    DelphiInterface<IHTMLElementCollection>     diColLinks;             DelphiInterface<IDispatch>          diDispLinks;
    DelphiInterface<IHTMLElementCollection>     diColPlugins;           DelphiInterface<IDispatch>          diDispPlugins;
    DelphiInterface<IHTMLElementCollection>     diColScripts;           DelphiInterface<IDispatch>          diDispScripts;
    DelphiInterface<IHTMLStyleSheetsCollection> diColStyleSheets;       DelphiInterface<IDispatch>          diDispStyleSheets;

    OleCheck(diDoc->Get_applets     (diColApplets));
    OleCheck(diDoc->Get_embeds      (diColEmbeds));
    OleCheck(diDoc->Get_frames      (diColFrames));
    OleCheck(diDoc->Get_images      (diColImages));
    OleCheck(diDoc->Get_links       (diColLinks));
    OleCheck(diDoc->Get_plugins     (diColPlugins));
    OleCheck(diDoc->Get_scripts     (diColScripts));
    OleCheck(diDoc->Get_styleSheets (diColStyleSheets));

    // Scan for applets external links
    for (int i = 0; i < diColApplets->length; i++)
        {
        OleCheck(diColApplets->item(i,i,diDispApplets));
        if (diDispApplets != NULL)
            {
            diDispApplets->QueryInterface(IID_IHTMLObjectElement, (void**)&diObj);
            if (diObj != NULL)
                {
                UnicodeString s1 = Sysutils::Trim(diObj->data),
                              s2 = Sysutils::Trim(diObj->codeBase),
                              s3 = Sysutils::Trim(diObj->classid);

                if (StartsText("http", s1) || StartsText("http", s2) || StartsText("http", s3))
                    {
                    HasExternalContent = true;
                    break;                                                  // At least 1 found, bar will be shown, no further search needed
                    }
                }
            }
        }

    // Scan for images external links
    for (int i = 0; i < diColImages->length; i++)
        {
        OleCheck(diColImages->item(i,i,diDispImages));
        if (diDispImages != NULL)                                           // Unnecessary? OleCheck throws exception if this applies?
            {
            diDispImages->QueryInterface(IID_IHTMLImgElement, (void**)&diImg);
            if (diImg != NULL)
                {
                UnicodeString s1 = Sysutils::Trim(diImg->src);

                // Case insensitive check
                if (StartsText("http", s1))
                    {
                    HasExternalContent = true;
                    break;                                                  // At least 1 found, bar will be shown, no further search needed
                    }
                }
            }
        }
    }
catch (Exception &e)
    {
    // triggered by OleCheck
    ShowMessage(e.Message);
    }

Is there an easier way to scan this or the only one is to run several loops using other interface functions such as Get_applets, Get_embeds, Get_stylesheets etc. similar to code above? So far I found I'd have to call following functions to cover all of this:

    OleCheck(diDoc->Get_applets     (diColApplets));
    OleCheck(diDoc->Get_embeds      (diColEmbeds));
    OleCheck(diDoc->Get_frames      (diColFrames));
    OleCheck(diDoc->Get_images      (diColImages));
    OleCheck(diDoc->Get_links       (diColLinks));
    OleCheck(diDoc->Get_plugins     (diColPlugins));
    OleCheck(diDoc->Get_scripts     (diColScripts));
    OleCheck(diDoc->Get_styleSheets (diColStyleSheets));

But I'd rather not implement that many loops if this can be handled easier. Can it?


回答1:


I suggest you this solution:

#include "html.h"
THTMLDocument doc;
void __fastcall TForm1::CppWebBrowser1DocumentComplete(TObject *Sender, LPDISPATCH pDisp,
          Variant *URL)
{
    doc.documentFromVariant(CppWebBrowser1->Document);

    bool HasExternalContent = false;
    for (int i=0; i<doc.images.length; i++) {
        if(doc.images[i].src.SubString(1, 4) == "http")
        {
            HasExternalContent = true;
            break;
        }
    }
    for (int i=0; i<doc.applets.length; i++) {
        THTMLObjectElement obj = doc.applets[i];
        if(obj.data.SubString(1, 4) == "http")
            HasExternalContent = true;
        if(obj.codeBase.SubString(1, 4) == "http")
            HasExternalContent = true;
        if(obj.classid.SubString(1, 4) == "http")
            HasExternalContent = true;
    }
}

This greate wrapper classes can be downloaded from here.



来源:https://stackoverflow.com/questions/10637550/detecting-external-content-with-tembeddedwb-or-twebbrowser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!