问题
I'm writing a software that gets the content from URL. When working on that, I run into to problem that I can not get exactly the HTML content after the java script finished. There are some websites that renders HTML by java-script, some do not support browsers which does not run js.
I tried using System.Windows.Controls.WebBrowser
with WebBrowser.Document
in LoadCompleted
but no luck.
After that, I tried the OpenWebkitSharp library. On the UI, it showes the content of website correctly, but with code Document
in DocumentCompleted
, it still returns the content which does not rendered by java-script.
Here is my code:
...
using WebKit;
using WebKit.Interop;
public MainWindow()
{
windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
webBrowser = new WebKit.WebKitBrowser();
webBrowser.AllowDownloads = false;
windowFormHost.Child = webBrowser;
grdBrowserHost.Children.Add(windowFormHost);
webBrowser.Load += WebBrowser_Load;
}
private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var contentHtml = ((WebKitBrowser)sender).DocumentAsHTMLDocument;
}
The contentHtml has value which is not rendered after java-script finished.
回答1:
Do solve this problem, I have added some trick into my code to get the full Html content after java-script finished.
using WebKit;
using WebKit.Interop;
using WebKit.JSCore; //We need add refrence JSCore which following with Webkit package.
public MainWindow()
{
InitializeComponent();
InitBrowser();
}
private void InitBrowser()
{
windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
webBrowser = new WebKit.WebKitBrowser();
webBrowser.AllowDownloads = false;
windowFormHost.Child = webBrowser;
grdBrowserHost.Children.Add(windowFormHost);
webBrowser.Load += WebBrowser_Load;
}
private void WebBrowser_Load(object sender, EventArgs e)
{
//The ResourceIntercepter will throws exception if webBrowser have not finished loading its components
//We can not use DocumentCompleted to load the Htmlcontent. Because that event will be fired before Java-script is finised
webBrowser.ResourceIntercepter.ResourceFinishedLoadingEvent += new ResourceFinishedLoadingHandler(ResourceIntercepter_ResourceFinishedLoadingEvent);
}
private void ResourceIntercepter_ResourceFinishedLoadingEvent(object sender, WebKitResourcesEventArgs e)
{
//The WebBrowser.Document still show the html without java-script.
//The trict is call Javascript (I used Jquery) to get the content of HTML
JSValue documentContent = null;
var readyState = webBrowser.GetScriptManager.EvaluateScript("document.readyState");
if (readyState != null && readyState.ToString().Equals("complete"))
{
documentContent = webBrowser.GetScriptManager.EvaluateScript("$('html').html();");
var contentHtml = documentContent.ToString();
}
}
Hope this one can help you.
来源:https://stackoverflow.com/questions/42994708/get-final-html-content-after-javascript-finished-by-open-webkit-sharp