Get final HTML content after javascript finished by Open Webkit Sharp

﹥>﹥吖頭↗ 提交于 2020-01-07 05:24:12

问题


I'm writing a software that gets the content from URL. When working on that, I run into to problem that I can not get exactly the HTML content after the java script finished. There are some websites that renders HTML by java-script, some do not support browsers which does not run js.

I tried using System.Windows.Controls.WebBrowser with WebBrowser.Document in LoadCompleted but no luck.

After that, I tried the OpenWebkitSharp library. On the UI, it showes the content of website correctly, but with code Document in DocumentCompleted, it still returns the content which does not rendered by java-script. Here is my code:

...
using WebKit;
using WebKit.Interop;

public MainWindow()
{
  windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
  webBrowser = new WebKit.WebKitBrowser();
  webBrowser.AllowDownloads = false;
  windowFormHost.Child = webBrowser;
  grdBrowserHost.Children.Add(windowFormHost);
  webBrowser.Load += WebBrowser_Load;
}

private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
  var contentHtml = ((WebKitBrowser)sender).DocumentAsHTMLDocument;
}

The contentHtml has value which is not rendered after java-script finished.


回答1:


Do solve this problem, I have added some trick into my code to get the full Html content after java-script finished.

using WebKit;
using WebKit.Interop;
using WebKit.JSCore; //We need add refrence JSCore which following with Webkit package.

public MainWindow()
{
  InitializeComponent();
  InitBrowser();
}

private void InitBrowser()
{
  windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
  webBrowser = new WebKit.WebKitBrowser();
  webBrowser.AllowDownloads = false;
  windowFormHost.Child = webBrowser;
  grdBrowserHost.Children.Add(windowFormHost);
  webBrowser.Load += WebBrowser_Load; 
}

private void WebBrowser_Load(object sender, EventArgs e)
{
  //The ResourceIntercepter will throws exception if webBrowser have not finished loading its components
  //We can not use DocumentCompleted to load the Htmlcontent. Because that event will be fired before Java-script is finised
  webBrowser.ResourceIntercepter.ResourceFinishedLoadingEvent += new ResourceFinishedLoadingHandler(ResourceIntercepter_ResourceFinishedLoadingEvent);
}

private void ResourceIntercepter_ResourceFinishedLoadingEvent(object sender, WebKitResourcesEventArgs e)
{
   //The WebBrowser.Document still show the html without java-script. 
   //The trict is call Javascript (I used Jquery) to get the content of HTML
   JSValue documentContent = null;
   var readyState = webBrowser.GetScriptManager.EvaluateScript("document.readyState");

   if (readyState != null && readyState.ToString().Equals("complete"))
            {
                documentContent = webBrowser.GetScriptManager.EvaluateScript("$('html').html();");
                var contentHtml = documentContent.ToString();
            }

}

Hope this one can help you.



来源:https://stackoverflow.com/questions/42994708/get-final-html-content-after-javascript-finished-by-open-webkit-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!