I want to use the HTML ability pack on a WebBrowser that has loaded all the things I need (It clicks a button with code to load every video on the channel) (It loads a YouTu
If the target website uses AJAX heavily (as Youtube does), it's hard, if not impossible, to determine when the page has finished loading and executing all dynamic scripts. But you could get close by handling window.onload
event and allowing an extra second or two for non-deterministic AJAX calls. Then call webBrowser.Document.DomDocument.documentElement.outerHTML
via dynamic
to get the currently rendered HTML.
Example:
private void Form1_Load(object sender, EventArgs e)
{
DownloadAsync("http://www.example.com").ContinueWith(
(task) => MessageBox.Show(task.Result),
TaskScheduler.FromCurrentSynchronizationContext());
}
async Task DownloadAsync(string url)
{
TaskCompletionSource onloadTcs = new TaskCompletionSource();
WebBrowserDocumentCompletedEventHandler handler = null;
handler = delegate
{
this.webBrowser.DocumentCompleted -= handler;
// attach to subscribe to DOM onload event
this.webBrowser.Document.Window.AttachEventHandler("onload", delegate
{
// each navigation has its own TaskCompletionSource
if (onloadTcs.Task.IsCompleted)
return; // this should not be happening
// signal the completion of the page loading
onloadTcs.SetResult(true);
});
};
// register DocumentCompleted handler
this.webBrowser.DocumentCompleted += handler;
// Navigate to url
this.webBrowser.Navigate(url);
// continue upon onload
await onloadTcs.Task;
// artificial delay for AJAX
await Task.Delay(1000);
// the document has been fully loaded, can access DOM here
return ((dynamic)this.webBrowser.Document.DomDocument).documentElement.outerHTML;
}
[EDITED] Here's the final piece of code that helped to solve the OP's problem:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(((dynamic)this.webBrowser1.Document.DomDocument).documentElement.outerHTML);