How to get XML (RAW/SOURCE) from a WebBrowser Control

前端 未结 2 486
旧巷少年郎
旧巷少年郎 2021-02-08 16:28

I am using the WebBrowser Control in my both Delphi and .Net C# test projects to navigate to a local test XML file and try to save the content back to a XML file in .Net D

相关标签:
2条回答
  • 2021-02-08 17:01

    I think you're approaching this the wrong way. A TWebBrowser control is a visual control intended for viewing. You may be able to extract the underlying data from it, but fundamentally, using visual control to download something (a non-visual action) is not a good approach. Instead, you should download the file using a dedicated API.

    Just for your information: There is no way for me to use WebClient or Indy components to access the xml. I also can't play as a Proxy since...

    Don't you have those components? In that case, I'd suggest you use either of the following approaches:

    1. TDownloadURL is an inbuilt class, useful for simple downloading of a file. Some examples of using it:

      • An HTML page scraper - obviously also applicable to XML
      • How to show a progress indicator while downloading - may not be useful if your file is small
    2. InternetReadFile. This is what I personally use in my own code - I have a small thread class to asynchronously download files and notify the main thread when they're done, implemented using this function. Use it by:

      • Use InternetOpen to initialise use of the internet functions; it returns a handle;
      • Use that handle to get another handle using InternetOpenUrl using the INTERNET_FLAG_HYPERLINK or INTERNET_FLAG_NO_UI flags
      • Then use that handle with InternetReadFile in a loop writing to a buffer until the file is read or your thread is terminated.
      • Don't forget to close the handles using InternetCloseHandle

      Sorry I can't post source code, but they're simple functions and you should find it easy enough to write.

    These approaches will get your either a file or a buffer, each containing the raw contents of your XML file.

    Edit: I see you explained a bit about why you can't use Indy:

    "The real scenario is much complex and need user interaction in the browser and after the user did everything there are some post posts between browser and user till the end result is a XML file which you have no control on where is comes from!"

    I'm not certain this stops you using Indy: instead you just need to get the location of this XML. The fact you don't control where it is doesn't matter, you just need to find out where it is. Either scrape the HTML if all you have is a link (you can already get HTML from the browser - in fact, that's your problem!) or look at the final location the TWebBrowser document is located at, and download that. In other words, let the user do whatever they have to do to navigate to the final XML file, but rather than trying to extract it from the web browser control, download it yourself.

    0 讨论(0)
  • 2021-02-08 17:06

    You could do a "shadow" download of the file in the TWebBrowser BeforeNavigate2 event.
    By shadow, I mean use a procedure from another library to download the file at the same time TWebBrowser is downloading it. This way, you can get the file without it being modified by TWebBrowser.

    I wrote a test application and all I had to do the get the file contents is

    procedure TForm1.WebBrowserBeforeNavigate2(Sender: TObject;
      const pDisp: IDispatch; var URL, Flags, TargetFrameName, PostData,
      Headers: OleVariant; var Cancel: WordBool);
    begin
      HttpGetText(URL,Memo1.Lines);
    end;
    

    The HttpGetText is a blocking function from the Synapse library http://www.ararat.cz/synapse/doku.php/start

    You could also use ICS, Indy, or TDownLoadURL. Note, TDownLoadURL is not blocking and I was never able to get its AfterDownload event to work.

    0 讨论(0)
提交回复
热议问题