How to get XML (RAW/SOURCE) from a WebBrowser Control

前端未结

关注

 2  495

I am using the WebBrowser Control in my both Delphi and .Net C# test projects to navigate to a local test XML file and try to save the content back to a XML file in .Net D


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2021-02-08 17:01
              
            
            
                                                                       
I think you're approaching this the wrong way.  A TWebBrowser control is a visual control intended for viewing.  You may be able to extract the underlying data from it, but fundamentally, using visual control to download something (a non-visual action) is not a good approach.  Instead, you should download the file using a dedicated API.


  Just for your information: There is no
  way for me to use WebClient or Indy
  components to access the xml. I also
  can't play as a Proxy since...


Don't you have those components?  In that case, I'd suggest you use either of the following approaches:


TDownloadURL is an inbuilt class, useful for simple downloading of a file.  Some examples of using it:


An HTML page scraper - obviously also applicable to XML
How to show a progress indicator while downloading - may not be useful if your file is small

InternetReadFile.  This is what I personally use in my own code - I have a small thread class to asynchronously download files and notify the main thread when they're done, implemented using this function.  Use it by:


Use InternetOpen to initialise use of the internet functions; it returns a handle;
Use that handle to get another handle using InternetOpenUrl using the INTERNET_FLAG_HYPERLINK or INTERNET_FLAG_NO_UI flags
Then use that handle with InternetReadFile in a loop writing to a buffer until the file is read or your thread is terminated.
Don't forget to close the handles using InternetCloseHandle 


Sorry I can't post source code, but they're simple functions and you should find it easy enough to write.


These approaches will get your either a file or a buffer, each containing the raw contents of your XML file.

Edit: I see you explained a bit about why you can't use Indy: 


  "The real scenario is much complex and
  need user interaction in the browser
  and after the user did everything
  there are some post posts between
  browser and user till the end result
  is a XML file which you have no
  control on where is comes from!"


I'm not certain this stops you using Indy: instead you just need to get the location of this XML.  The fact you don't control where it is doesn't matter, you just need to find out where it is.  Either scrape the HTML if all you have is a link (you can already get HTML from the browser - in fact, that's your problem!) or look at the final location the TWebBrowser document is located at, and download that.  In other words, let the user do whatever they have to do to navigate to the final XML file, but rather than trying to extract it from the web browser control, download it yourself.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2021-02-08 17:06
              
            
            
                                                                       
You could do a "shadow" download of the file in the TWebBrowser BeforeNavigate2 event.

By shadow, I mean use a procedure from another library to download the file at the same time TWebBrowser is downloading it.  This way, you can get the file without it being modified by TWebBrowser.

I wrote a test application and all I had to do the get the file contents is

procedure TForm1.WebBrowserBeforeNavigate2(Sender: TObject;
  const pDisp: IDispatch; var URL, Flags, TargetFrameName, PostData,
  Headers: OleVariant; var Cancel: WordBool);
begin
  HttpGetText(URL,Memo1.Lines);
end;


The HttpGetText is a blocking function from the Synapse library http://www.ararat.cz/synapse/doku.php/start

You could also use ICS, Indy, or TDownLoadURL.  Note, TDownLoadURL is not blocking and I was never able to get its AfterDownload event to work.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复