How to get rendered html (processed by Javascript) in WebBrowser control?

前端未结

关注

 5  473

I have an ASP.NET page and some custom class that fetch specified webpage, and returns this page body.

protected String GetHtml()
{
          Thread thread =


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-11-27 07:30
              
            
            
                                                                       
As George said in one of the comments, in theory you can just get the DOM in  webBrowser1_DocumentCompleted by just using:

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-11-27 07:33
              
            
            
                                                                       
You can get 


  webBrowser1.Document.Body.OuterHtml

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2020-11-27 07:41
              
            
            
                                                                       
Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:

Place a WebBrowser control named webBrowser1 on the Form of class Form1.

[Form1.cs[Design]]



Then for code use:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.webBrowser1.ObjectForScripting = new MyScript();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Navigate("http://localhost:6489/Default.aspx");
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
        }

        [ComVisible(true)]
        public class MyScript
        {
            public void CallServerSideCode()
            {
                var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
            }
        }
    }
}


Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.

You can access the modified DOM in the CallServerSideCode() method, for example:

doc.GetElementById("myDataTable");


Or you can access the rendered HTML like this:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-11-27 07:47
              
            
            
                                                                       
Another way would be to set a timer on the form, then when the timer hits, the page will have re-rendered and you can parse the page.  
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-11-27 07:48
              
            
            
                                                                       
First a little background. I have been trying to scrape information from a web page. The content of this webpage is dynamic. What I mean by dynamic is that the web page loads more information as you scroll down to the bottom of the page. The HTML content changes as you scroll to the bottom of the page. Unfortunately the Web Browser Object does not update this information automatically. It still has the original document that it first loaded via the webbrowser.navigate function. The updated information is available to the HTMLElementCollection.

The following code did not work for me.

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml


I broke up the above statement as follows

    Dim eCollections As HtmlElementCollection
    Dim strDoc As String
    eCollections = WB.Document.GetElementsByTagName("HTML")
    strDoc = eCollections(0).OuterHtml


Worked like a charm. Hope this helps someone too.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复